UNIV RSITY 
OF MICH'GAN 


JAN 16 | 


SCIENCE 
LIBRAPY 


ch ieenet ileal 


A JOURNAL DEVOTED TO THE DEVEL- im 
OPMENT OF PSYCHOLOGY AS A 


QUANTITATIVE RATIONAL SCIENCE 


















































THE PSYCHOMETRIC SOCIETY - ORGANIZED IN 193 








LUME 15 
BER 4 


Eck MBER 
5 0 











PSYCHOMETRIKA, the official journal of the Psychometric Society, is devoted 
the development of psychology as a quantitative rational science. Issued 
times a year, on March 15, June 15, September 15, and December 15 


DECEMBER 1950, VOLUME 15, NUMBER 4 


Printed for the Psychometric Society at 28 West Colorado Avenue, Colo 
Springs, Colorado. Entered as second class matter, September 17, 1940, at t 
Post Office of Colorado Springs, Colorado, under the act of March 3, 1879, 1 
torial Office, Department of Psychology, The University of North Caro 
Chapel Hill, North Carolina. 


Subscription Price: The regular subscription rate is $10.00 per volume. The 8 
scriber receives each issue as it comes out, and a second complete set for bind 

at the end of the year. All annual subscriptions start wii'). the March issue 
cover the calendar year. All back issues are available. The price is $1.25 
issue or $5.00 per volume (one set only). Members of the Psychometric Society” 
pay annual dues of $5.00, of which $4.50 is in payment of a subscription 
Psychometrika. Student members of the Psychometric Society pay annual d 

of $8.00, of which $2.70 is in payment for the journal. 


Application for membership and student membership in the Psychometric § 
together with a check for dues for the calendar year in which apoliculiile 
made, should be sent to 


T. GAYLORD ANDREWS, Chairman of the Membership Committee 
Department of Psychology, The University of Maryland, College Park, | 
Maryland a 


Payments: All bills and orders are payable in advance. Checks covering mem- 
bership dues should be made payable to the Psychometric Society. Checks cover- 
ing regular subscription to Psychometrika and back issue orders should be made ~ 
payable to the Psychometric Corporation. All checks, notices of change of ad- 
dress, and business communications should be addressed to 


Rosert L, THORNDIKE, Treasurer, Psychometric Society and Psychometrie 
Corporation 

Teachers College, Columbia University 

New York 27, New York 


Articles on the following subjects are published in Psychometrika: 


(1) the development of quantitative rationale for the solution of peveholog 
cal problems; 4 

(2) general theoretical articles on quantitative methodology in the social and 
biological sciences; 

(3) new mathematical and statistical techniques for the evaluation of pay- 
chological data; 

(4) aids in the application of statistical techniques, such as nomographs, 
tables, work-sheet layouts, forms, and apparatus; a 

(5) critiques or reviews of significant studies involving the use of quantita- © 
tive techniques. 1 


The emphasis is to be placed on articles of type (1), in so far as articles of this © 
type are available. 


In the selection of the articles to be printed in Psychometrika, an effort is made © 
to obtain objectivity of choice. All manuscripts are received by one person, who = 


(Continued on the back inside cover page) 

















st 





Psychometrika 





CONTENTS 


A SUPERIOR ROTATIONAL METHOD IN FACTOR ANAL- 
YSIS OR PSYCHOMETRICIANS IN GOVERN- 
MENT SERVICE - - - - - - = = 


DOROTHY C. ADKINS 


ESTIMATION OF PARAMETERS IN A TRUNCATED 
TRIVARIATE NORMAL DISTRIBUTION - - 
D. F. VOTAW, J. A. RAFFERTY, AND W. L. DEEMER 


THE JOHNSON-NEYMAN TECHNIQUE, ITS THEORY 
AND APPLICATIONS - - - - = =- - 
PALMER 0. JOHNSON AND LEO C. FAY 
THE COMPARABILITY OF SCORES FROM THREE 


MATHEMATICS TESTS OF THE COLLEGE 
ENTRANCE EXAMINATION BOARD - - - 


DOUGLAS G. SCHULTZ 


ON THE EFFECT OF THE CUTTING SCORE WHEN SE- 
LECTION IS PERFORMED AGAINST A 
DICHOTOMIZED CRITERION - - - - - - 


Z. W. BIRNBAUM 


MAXIMIZING PREDICTIVE EFFICIENCY FOR A FIXED 
TOTAL TESTING TIME - - -- - - - - 
CALVIN W. TAYLOR 


A NOTE ON OPTIMAL TEST LENGTH - - - - 
PAUL HORST 


PREDICTED DIFFERENCES AND DIFFERENCES 
BETWEEN PREDICTIONS - - - - - 
WILLIAM G. MOLLENKOPF 


(Continued) 








VOLUME FIFTEEN DECEMBER 1950 NUMBER 











CONTENTS (Continued) 


DETERMINATION OF THE OPTIMUM NUMBER OF 
ITEMS TO RETAIN IN A TEST MEASURING 
A SINGLE ABILITY - - - - - - - 


B. J. BEDELL 


A COMPARISON OF TWO PROCEDURES FOR CALCU- 
LATING DISCRIMINANT FUNCTION 
COEFFICIENTS - - - - - = = = 


JOHN SCHMID, JR. 


THE VARIANCE ERROR OF THE P5.-DISCRIMINANT 
GILBERT L. BETTS 


MACHINE SHORT-CUTS IN THE COMPUTATION OF CHI- 


SQUARE AND THE CONTINGENCY 
COEFFICIENT - - - - - - = = = 


JOHN B. CARROLL AND C. C. BENNETT 


JERZY NEYMAN (Editor). Proceedings of the Berkeley Sym- 


posium on Mathematical Statistics and Probability 


A Review by Churchill Eisenhart 
BOOKS RECEIVED - - - = += = = = = = 


REPORT OF THE TREASURER OF THE PSYCHO- 
METRIC SOCIETY - - - - - = - 


REPORT OF THE TREASURER OF THE PSYCHO- 
METRIC CORPORATION - - - - - = 


Se ee ae ee ee ee 


SPECIAL NOTICE 





419 


431 


435 


441 


449 


453 


455 


456 
457 


The American Association for the Advancement of Science is to 
meet in Cleveland, Ohio, December 26-30, 1950. Section I, Psychol- 
ogy, is scheduled for 15 sessions on December 28-29, and a joint eve- 
ning program with Section Q, Education, on December 27. Section 
Q has other programs scheduled for December 26-28. Attention is 
also directed to a symposium on “Genetics and Behavior” scheduled 
by Section F, Zoological Sciences, for December 29. Announcements 
and coupons for reservations appear in Science and The Scientific 


Monthly, beginning the last of August. 

















PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


A SUPERIOR ROTATIONAL METHOD IN FACTOR ANALYSIS 
OR 
PSYCHOMETRICIANS IN GOVERNMENT SERVICE* 


DoroTHY C. ADKINS 
THE UNIVERSITY OF NORTH CAROLINA 


The topic that I have chosen, “A Superior Rotational Method in 
Factor Analysis or Psychometricians in Government Service,’ may 
puzzle you at first. But there is an explanation. A study in progress at 
the University of North Carolina requires the interpretation of 16 
factors obtained from a centroid analysis of 66 variables. The task 
that was approached with greatest apprehension was the location of a 
meaningful reference frame. With so many vectors protruding into a 
space of such a large number of dimensions, the problem appeared 
practically insoluble.: Hence we sought a rotational method that would 
obviate the necessity for the numerous graphs and the matrix multipli- 
cation associated with the commonly accepted techniques. 

Perhaps some of you have had the experience of solving a difficult 
mathematical problem in a state of sleep and later being able to trans- 
cribe the complete solution immediately. A relatively painless approach 
to mathematics, it is the one which enabled me to pass a course in 
Mathematical Probability years ago. I scarcely dared presume that I 
could evoke it in the present emergency. To my astonishment, however, 
it worked. One day the project staff had had a particularly trying ses- 
sion during which we tried in vain to stretch the available funds to 
cover the rotational work. Upon falling asleep in my office chair, I 
dreamed a solution so simple that an ordinary statistical clerk, upon 
mere inspection of the score matrix, could visualize the most interrret- 
able reference frame in up to an infinite number of dimensions. This 
was beyond our most extravagant hopes, for it rendered obsolete both 
the correlation matrix and the centroid solution. 

Upon awakening, I was able to encompass the solution in a half- 
page manuscript. Despite the fact that I was then as now a very re- 
tiring President, I recognized that it was imperative to get this signal 


*Address of the Retiring President of the Psychometric Society. delivered at 
the Annual Meeting at Pennsylvania State College, State College, Pennsylvania, 
September 6, 1950. 


331 














332 PSYCHOMETRIKA 


contribution into print posthaste. The manuscript was sent in centup- 
licate for clearance to the Adjutant General’s Offiee, which is sponsor- 
ing the study. To make triply sure that it would not give undue aid and 
comfort to the enemy, it was also sent to the Navy and the Air Force. 
All three agencies urged abandonment of prior commitments and cus- 
tomary editorial procedures in order to give it immediate publication 
in Psychometrika. 

Still concerned, however, about the possibilities of so powerful a 
tool in the hands of the enemy, I dispatched a wire to Professor Thur- 
stone. Although I was not yet free to release the method to a civilian, I 
nevertheless solicited his advice on whether or not a solution to the 
problem should be regarded as top secret. I shall always be indebted to 
him for devoting a month to deliberation on my query. His studied 
opinion was to doubt whether any approach to a factor analysis prob- 
lem that I had dreamed up would give aid or comfort to anyone any- 
where. While this reassurance was heartening, the solution still looked 
so ingenious that I began to fear that it might rather be ingenuous. So 
I incorporated the manuscript into a night letter for simultaneous 
transmittal to Professors Thurstone, Kelley, Holzinger, and Guilford; 
telephoned across the street to Professor Hotelling; cabled Professor 
Godfrey Thomson and Sir Cyril Burt; and communicated through a 
medium with Professor Spearman. The answers were instantaneous 
and enthusiastic. The respondents were in complete accord that here 
was not only the long-awaited solution to the rotational problem in 
multiple-factor analysis but also the principle that integrated all con- 
flicting factorial methods. 

The whir of a calculating machine penterated my slumber. I 
awoke with perfect awareness that I had just dreamed that I had had 
a dream which revolutionized factor analysis. I recalled vividly that 
after the dream within a dream I had written down the all-enveloping 
principle and communicated with the authorities mentioned a moment 
ago. To my great dismay, however, I realized that I had complete am- 
nesia for the simple solution. Although I have faith that the entire 
dream sequence will be repeated, to date that experience has been de- 
nied me. 

Hence the need for an alternative topic. “Psychometricians in 
Government Service” has, perhaps, the distinction of being the least 
esoteric subject to which any member of the Psychometric Society ever 
has had the temerity to address himself. Actually, as you readily may 
suspect, the major influence in my selection of this topic last week was 
that it would require less reading, writing, and arithmetic than any 
other that occurred to me. More socially acceptable, I trust, will be the 
fact that I am sincerely interested in fostering a better utilization of 




















DOROTHY C. ADKINS 300 


psychometricians by government agencies and, conversely, in motiv- 
ating psychometricians to improve their contributions to government 
service. I am hopeful of capturing your interest to the extent that each 
of you will follow up with positive action one or more of the suggestions 
to be made. 

Let us consider first what the psychometrician has to offer a gov- 
ernment program. The field of personnel alone provides multiple prob- 
lems calling for application of psychometric principles and techniques. 
Employees have to be selected, placed, and classified. Their jobs must 
be analyzed. They have to be trained. Their services need evaluation. 
Their productiveness has to be appraised. Government positions that 
deal with these personnel problems require more than the routine ap- 
plication of familiar formulas. There is ample room for research and 
improvement. If we are not satisfied with progress in the measurement 
of aptitudes and abilities, consider the still greater need for fundamen- 
tal knowledge of personality and temperament. In view of the oft- 
expressed opinion that many more job failures are attributable to per- 
sonality faults than to lack of aptitude or knowledge, here lie great op- 
portunities for enterprising psychometricians. 

Although personnel supplies the more obvious examples of con- 
tributions that psychometricians can make to government, the design 
of experiments in other areas of psychology, the prescription or de- 
vising of appropriate statistical techniques to expedite the proper 
drawing of inferences, and the development and use of large-scale 
computational techniques should not be overlooked. 

While most of us agree that the psychometrician is valuable in 
government, some of us are skeptical about the rewards he may derive 
from public employment. Let me recount a few. Most significant, I be- 
lieve, are the far-reaching effects of government programs. One de- 
cision may influence thousands of individuals. A single blunder may 
cost millions of dollars. Improved methods for selecting personnel may 
increase the efficiency of government many-fold. This type of consider- 
ation is appealing to persons who are influenced more by the service 
than by the profit motive. 

Although persons do not enter government employment with the 
expectation of great wealth, the relatively high salary schedule may be 
regarded as another advantage. I am convinced that many persons em- 
ployed in government below the top administrative level are earning 
more than they could in civilian employment. Lest this remark be mis- 
interpreted, I hasten to add that in my opinion most of the rest of us 
also would have higher earning power in public service. And only the 
individual who lacks ability or initiative need fear a dead-end job in 
government. 








PSYCHOMETRIKA 


Another desirable feature of public employment is that it affords 
opportunities for research. True, there will always be limits to the kinds 
of investigations that can be undertaken by tax-supported agencies. But 
surely no one will deny the marked advance in recent years. Even the 
old-line agencies are coming to emphasize the type of research that can 
be justified as being closely related to improvement of current prac- 
tices. The volume of such research has increased markedly within the 
past decade. Defense agencies have been able to go much further than 
most of us anticipated in stimulating and financing research of a type 
whose application is considerably more remote. One reason that gov- 
ernment agencies can promote comprehensive research studies is the 
availability of funds. Another condition favorable to research is the 
large supply of human subjects other than college sophomores. 

Despite the favorable aspects of government service, there are 
changes that would lead to the more effective utilization of psycho- 
metricians. 

Almost all government agencies are hampered by the year-to-year 
basis on which they receive appropriations of funds. This makes for 
difficulty in sensible planning of recurring or continuing operations. It 
produces still greater barriers to intelligent programming of extensive 
research ventures. Administrative heads of agencies are naturally re- 
luctant to make commitments for projects that may suffer financial 
curtailment at the whim of a congressman. If funds cannot be ear- 
marked in advance for the completion of a research program, this in 
turn is a hindrance to recruitment of competent personnel. It is poor 
solace to a research team to be told that if monies for a particular study 
are not forthcoming they possibly can be transferred to some other 
position or agency. Our economists are not notably successful in pre- 
dicting national income, even on a short-term basis. But clearly the 
total amount of money in the public purse in even lean years should 
provide enough latitude to guarantee the continuance of more urgent 
research programs for several years. Perhaps the experience of the de- 
fense agencies in getting commitments for funds to honor research 
contracts for periods that extend beyond a given fiscal year may pro- 
vide clues upon which other agencies can act. I am convinced that a 
change in this direction would markedly improve the government as 
a working environment for psychometricians and would significantly 
enhance the morale of government workers in general. Well-directed 
efforts to this end would be expedient. 

On the other hand, one of the major drawbacks to government 
work is that it is so easy to “get in a rut.” A few employees in some 
agencies are granted occasional leave to complete university education. 
I would like to suggest, however, that every four or five years almost all 




















DOROTHY C. ADKINS 330 


government workers should absent themselves from their regular posi- 
tions for six months to a year. They might pursue formal education, 
take positions with industry, get teaching assignments in universities, 
interchange with employees of other government agencies, perform 
different duties in their own agencies, broaden their experience 
through travel—anything to keep them aware that there is more than 
one way to approach a job and that their own methods are not indis- 
pensable. It would be difficult and perhaps impossible to force an em- 
ployee to take such leave, especially if it entailed substantial loss of in- 
come. But those of you in key positions in government could exercise 
telling persuasion through such means as making recommendations for 
promotions partially contingent on breadth of education and exper- 
ience. At the same time, you would have to be prepared to make the at- 
tendant adjustments in your own programs, maintaining a staff suffi- 
ciently flexible that the regular work would get done and facing the 
fact that some employees would not return to your fold. In time, you 
would need to facilitate the temporary absorption of other employees 
into your program. Implementation of such a plan would also require 
the support and cooperation of university personnel. 

This leads to another point. Apart from any possibility of expan- 
sion of services, there are not enough qualified psychometricians to 
fill existing positions, both in government and elsewhere. Some of us 
may have deterred superior students from specializing in psychomet- 
rics by over-emphasizing its abstruse qualities. Perhaps, rather, each 
of us could expend much greater effort in attracting able students and 
encouraging a portion of them to enter public service. 

In order to utilize psychometricians in government more effective- 
ly, those in supervisory and administrative roles are beginning to spon- 
sor a wide variety of continuous training programs. These of course 
should treat any problems peculiar to the agency in question. They can 
well include effective means of written and oral communication. More 
importantly, however, they should force the employee to keep abreast 
of current developments in his field. High standards should be main- 
tained, and training should be to the mastery level. If at all feasible, 
work assignments should be geared to the training in such a way as to 
require command of the material assigned. 

Such a program should be accompanied by conscious exertion to 
provide sound training in supervisory and administrative techniques. 
Too often the assumption is made that one learns how to supervise 
merely by being in a subordinate position and that he will gain insight 
into principles of administration simply by being a member of a going 
organization. Probably training in supervision and administration has 
to come in large part through experience, but it must be thoughtfully 














326 PSYCHOMETRIKA 


directed experience. Some psychometricians profess boredom and im- 
patience with administrative work. There can be little doubt, however, 
that progress toward the goals of the profession in government will 
depend upon the extent to which psychometricians competently fill 
strategic administrative positions. For only under such circumstances 
are they likely to have the authority commensurate with the responsi- 
bility for technical decisions. 

Administration has many facets, which I am not attempting to de- 
tail here. Let me mention, however, that the charge I have most fre- 
quently heard leveled at psychometricians in government circles is their 
impracticality. The criticism is not always unfounded. We are prone to 
overlock that a method ideal for 100 cases may become unwieldy for 
100,000. Or we become so intent on doing a job by the best possible 
method that we do not get it done at all. It is difficult to accept that a 
job accomplished on time is preferable to a theoretically superior job 
completed too late. But the frequent ascription of impracticality to psy- 
chometricians must be combatted. This requires strict adherence to a 
rule that no dead-line ever shall be missed. Meeting time commitments 
becomes a first order of business. Once this is insured, any temporal 
margin can be devoted to refinements. Of course, this policy may neces- 
sitate compromises with ideal procedures. Nonetheless, considering 
the repercussions that would ensue from missing just one dead-line— 
having copies of a test ready for distribution to 50,000 or more appli- 
cants, for example—I am convinced that such a policy is essential to 
an acceptable status for psychometricians in government. 

A related point of view that I have encountered is that psycho- 
metricians tend to gloss over technical questions in discussions with 
persons in coordinate or higher-level administrative positions. It is 
true that the intricacies of some of our methods are beyond the ken of 
lay administrators. On the other hand, as long as their positions entail 
responsibility for a psychometric program, they have an obligation to 
raise questions concerning the general nature of techniques being pro- 
posed, their purposes, the time and cost they demand, and the relative 
gains to be anticipated from them. To provide such interpretation 
in an accurate way requires skill in psychometrics as well as in clear 
expression. It often demands more—a willingness to present the merits 
of alternatives without a bias in favor of a course of action that may be 
technically the best but practically less desirable than some other. As 
a backlog of understanding develops, the necessity for detailed justifi- 
cation of recommendations is reduced, until finally perhaps complete 
responsibility for technical matters rests with the psychometrician. 
Thus training in the adequate interpretation of technical subjects must 
embrace attitudes that can have a critical effect on the success or fail- 














DOROTHY C. ADKINS 337 


ure of the psychometrician in government. 

One other aspect of the relation of psychometricians to govern- 
ment service warrants comment. Some individuals are convinced that 
acceptance of a government position is accompanied by loss of status 
as a member of a professional group. They suspect that they will be 
unable to read professional literature, write professional articles, at- 
tend professional meetings, or work 65 to 70 hours a week. 

In my period of government service I saw little evidence to sub- 
stantiate these fears. Library facilities in Federal agencies are in gen- 
eral excellent, and reference materials not at hand can be obtained in 
short order from the Library of Congress or another library. It is rec- 
ognizably easy for government psychometricians to fall into a pattern 
of neglecting professional literature. With efficient planning, however, 
the majority could free sufficient time to devote an hour or two per 
day of the regular 40-hour week to professional reading. Then, too, 
some agencies have training programs which require professional 
reading. 

Moreover, government agencies are tending increasingly to en- 
courage professional writing, in many cases again on official time. As 
you may know, manuscripts for publication ordinarily have to have 
official clearance from the standpoint of agency policy. In the organi- 
zations for which I worked, however, I can recall no employee who had 
any undue difficulty on this score. In fact, the articles were usually im- 
proved in the process of clearance. 

Most government agencies employing psychometricians grant offi- 
cial leave as required for attendance at professional meetings. They 
have quite liberal although varying policies about payment of expenses 
of those desiring to attend such meetings, depending largely on budget- 
ary considerations. Nor does government employment preclude the 
holding of offices in professional organizations. 

Let us recognize, too, that government psychometricians can and 
often do work longer than the required 40 hours per week. Many of 
them readily fall into the habit of a regular work day, however, and 
come to regard it as productive of greater efficiency than results from 
a more sporadic working schedule. 

I have attempted to show some of the changes that you as individ- 
ual members of the professional group can help to effect. There remain 
for brief consideration what actions might be appropriate for the Psy- 
chometric Society in fostering better use of its constituents in govern- 
ment service. 

First, I suggest that future program committees of the Psycho- 
metric Society consider symposia on practical problems in government 
research in which administrative heads of government agencies would 











338 PSYCHOMETRIKA 


be invited to participate. Joint consideration of the administrative and 
technical points of view as they bear on particular research problems 
would facilitate mutual understanding. 

Second, perhaps the Editorial Board of your official publication, 
Psychometrika, could take steps to encourage its contributors to write 
more readable articles. Greater emphasis on readability would serve 
several useful ends. It would perhaps encourage authors to develop 
greater skill in conveying their ideas in simple language. It would in 
some cases save time for the well-trained psychometrician. And it clear- 
ly would facilitate the education of would-be psychometricians who 
are assigned technical articles as required reading. 


A committee on problems of psychometricians in government 
might be considered. Such a committee could have various functions, 
including cooperation with government agencies on job descriptions, 
standards of performance, recruitment, and development and rating 
of examinations. It might develop a register of psychometricians 
available for consultative work in government and offer its services es- 
pecially in recruiting for key positions. It could also maintain a coor- 
dinated panel of psychometricians in government and in universities or 
in industry who would be interested in temporary exchange of posi- 
tions. 

Another committee might consider the general question of optimal 
training for psychometricians, both while they are students and after 
they become employed. 

Perhaps a carefully chosen group should be assigned the develop- 
ment of a good public-relations program for psychometricians. One 
possible approach is magazine or Sunday supplement articles high- 
lighting the benefits to be derived from application of psychometric 
principles in government. Eventually I would hope that a public rela- 
tions committee, working in cooperation with government agencies, 
could assist them in dramatizing their appropriation requests pertain- 
ing to psychometric programs. Too little concerted and skilled effort 
has ever been devoted to elucidating to the Bureau of the Budget and to 
congressional committees the actual money savings that would result 
from an up-to-date application of known psychometric methods and an 
enlightened program of psychometric research. 


Finally, I want to say that I should be glad to learn what further 
ideas you may have on the problems I have discussed. For official action 
on the part of the Psychometric Society, however, I am confident that 
your next President will be receptive to your proposals. 














PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


ESTIMATION OF PARAMETERS IN A TRUNCATED 
TRIVARIATE NORMAL DISTRIBUTION 


D. F. VoTAW, JR., J. A. RAFFERTY, AND W. L. DEEMER 
USAF SCHOOL OF AVIATION MEDICINE 
RANDOLPH FIELD, TEXAS 


This paper gives maximum-likelihood estimators for certain 
parameters in a truncated trivariate normal distribution when the 
values of the other parameters are known. The estimators are func- 
tions of a random sample. Approximate variances and covariances 
of the estimators, when the sample size is large, are also given. 
The type of truncation considered is merely restriction of the range 
of one of the variates, whose true mean and variance are assumed 
to be known. Two cases of such restriction are treated: (a) (8 < 
x < + ©); (b) (— © < @ < 38’), where 5 and 9’ are arbitrary 
“cutoff points’ which are assumed to be known. A precise state- 
ment of the estimation problem is given in Section 1. Section 2 con- 
tains preliminary calculations. The estimators appear in Section 3. 
The asymptotic variances and covariances of the estimators are 
given in Section 4. The estimators and their asymptotic variances 
and covariances can be easily specialized to be suitable for the case 
Py _— truncated bivariate normal distribution (Sections 3 
and 4). 


1. Introduction. Let X,, X2., X; be chance quantities having 
the probability density function 


fa(%1,%2,%3) = 


|Aij|”* (22)* d exp [—(1/2) bS Aij(%i — mi) (%j—mj)], (1.1) 


where A; il is positive definite, 0 <d <1, 
Hy, SMS two; —O <M, 1, + w, (1.2) 


and x,, is determined by d; the symbols d, m,, m2, m3, Au, Ais; 


--, Azz; represent parameters in the density function. When d = 1, 
i = and the density function in (1.1) is a normal trivariate 


density function for which m; is the “true” mean of X; and the vari- 
ance-covariance matrix of (X,, X2, X;) is, 


AY = [Au = Iles os pial» (1.5) 


where o; is the “true” standard deviation of X; and pi; = p;i is the 
“true” coefficient of correlation between X; and X;. For all values 


339 














340 PSYCHOMETRIKA 


of d the relation between d and 1, is 


d= [1/(o: 2a) ] i} exp [—(x, — m,)?2/2 «,2)] da, 


= [1/(V2za)] { exp [—t?/2] dt, (1.4) 


where 2; = (x1, — m;)/o1- 

When d < 1, fa(%:, 2, 23) is a truncated trivariate normal den- 
sity function, where the truncation is on the range of X, at the point 
1, and the possible values of the chance vector (X,, X., X3) are in- 
dicated by (1.2). Integration of zx. and x; out of fi(#1, %2, Xs) yields 
a truncated univariate normal distribution f,(z,) for X,, for which 
the “true” mean and variance are, respectively 


E(X,) =™,+ 04, Ri, 
(1.5) 


ox, =o’; (1 iz Ra sr R?,) =071Vu, 


where R,; = (0./d), 04 = [1/V/ 2a] exp [—z2u/2], Ra = ziRi, 2a is de- 
fined in (1.4), and vz=1+ ki; — R%. 

A truncated trivariate normal distribution is relevant to cer- 
tain problems in psychometrics, biometrics, and various other fields. 
In many of these problems the experimenter may assume that the 
values of some of the parameters are known a priori, and his aim is 
to estimate* the values of the other parameters. We now proceed 
to state the problem treated in this paper. 

Let the parameters whose values are known and the parameters 
whose values are unknown be as indicated in (1.6) [see (1.1), (1.2)]: 


Known Unknown 
mM, Ms 
Ms o> 
O71 P12 (1.6) 
O3 P23 
P13 
d 


It is desired to obtain estimators for m., o2, p:2, pe; and, when the 
sample size is large, to obtain the approximate variances and covari- 
ances of the estimators. The problem will be dealt with by means of 
the method of maximum likelihood (see 3, section 6.24). 


*The estimation is to be made on basis of a sample of values of (X,, X,, 


mii 


































D. F. VOTAW, JR., J. A. RAFFERTY, AND W. L. DEEMER 341 


An entirely similar problem arises when the range of X, is not 
(41, S %1 < «) but (— «0 <2, <2, ) where 


d= (1/(av2a)] f * exp [—(2, —m)*/2 0%.) de, 
ie (1.7) 


ee 
= (1/\V2a) { exp [—t?/2] dt, (z’,,= (a,— 1m.) /a:), 


and the value of d’ is assumed to be known (0 < d’ < 1). The esti- 
mators in this problem are the same as the estimators in the first 
problem (see the last paragraph of Section 3), and their asymptotic 
variances and covariances are easily derived from those obtained in 
the first problem (see the last two paragraphs of Section 4). 


2. Preliminary Calculations. A random sample of size n from 
a population having the density function given in (1.1) will be rep- 
resented by 0,(Xia, Xoa, X3a) (@=1, -:-,n) (n> 8). The likeli- 
hood function of (m2, 02, pi2, pes) for any given sample 0, and given 
values of m1, 01, Mz, 03,5 p13 1S: 


L (Me , 02 5 pr2 y p23 On» M1 01 » Ms , Os » Ps) 


=a fa(Xia, X2a, Xsa). (2.1) 


a@=1 


Let (m2, 02, pir, pos) be the particular vector for which L assumes 
its maximum value over variations in (m2, o2, pi2, pos). The quan- 


A A A 


tities m2, o2, pio, pos are the maximum-likelihood estimators of mz, 
0, Pir, pos, respectively [see (3.1) ]. 
At this point it is convenient to introduce certain notations. Let 
D = |pij| =1 + 2piepispes — p212 — p21s — pes » 
i 1 sia o p13 ’ 
o = } oe p12 ’ 
pL,  @ 
U = pi3P23 — P12 » 
Y = Pi2pi3s — pes >» and 
& = Pi2P23 — pis - 
It follows from (2.2) that 














PSYCHOMETRIKA 


D=ua+ Vpi2 + Ypes y 
Vpis + Y ——Upes » (2.3) 
Y pis <i v= —Upie2 . 


The sample means and sums of squares and cross-products will be 
represented as follows: 


X,=> Xie/n, (i=1,2,3), 


(2.4) 
Qi; =D (Kia— Xi) (Xja— Xj) =i Vana, (4,7 =1,2,8) , 


where 7;; is the sample coefficient of correlation between X; and X;. 
The following notation, which is parallel with (2.2), will be useful: 


D= [75 =1 4+ 2rePis%23 — 1712 — 1713 — 1723 , 
“—1—7",;, 


{—1—71%;3, (2.5) 
V =T1 13123 — Ti2 » 
Y =Tr2T 13s — T23, and 


:= Ti2 Te3 — 113 « 
There are identities, similar to those in (2.3), that involve quan- 


tities defined in (2.5). The adjoint of the matrix la: 5l| will be repre- 
sented by 


||a | _ (2.6) 
The density function (1.1) can be written as 
ha (21, Xs) 9 (H2|%1 , Xs), (2.7) 


where 


ha (21,43) = [22 0,0;d] u-/? exp {—(1/2u) [ (4, — m,)?/o?, 
+ (x3 — m;)?/o*s — 2pis (x1 — m,) (23 — mz) /(o103) }} . (2.8) 


which is a truncated normal bivariate density function, and 
9 (%2|21,%3) = [0/2] exp [—(1/2 o?) (4. —¢ — ax, — bas)?] , (2.9) 


which is the conditional density function of X, given (X,,X;), where 




















D. F. VOTAW, JR., J. A. RAFFERTY, AND W. L. DEEMER 343 


o*? =o,” D/u, 
a =—o2,V/1U, 

2.10 
b ——O2 y/os U, ( , 


C =m, + mM «2 V/(o,U) + M3 02 Y/ (os %), 


[see (2.2)]. g(x2|x1,43) is a normal univariate density function where 
x, and 2; are “fixed” variates. From (2.10) it follows that 


o*2 =o" + o*, a? + o*, b? + 2ab o:1 @3 pis ; 
Piz = @ 0;/ (a2) + be; pi3/ (o2), 


p23 — 20; pis/ (a2) + ba;/ (02), 
M2,—=c + am,'+ bm; , 


[see (2.3)]. The transformation in (2.11) and (2.10) from (c, a, 
b, c) to (2, pi2, pes, Me) iS continuous and one-to-one. 

It should be noted that ha(x,, x3) [see (2.8) ] contains only par- 
ameters whose values are known and g(w2|%,, 23) [see (2.9)] con- 
tains all the parameters whose values are unknown [see (1.6) ]; these 
assertions together with (2.7) imply that for variations of (mz, «2, 
pi2 » pos), Aa(%1,%3) is a constant and thus that the value of (m, oz, 
pi2 » pes) for which the likelihood 


(2.11) 


L= [whales toe) 1629 (aelties 200) ] (2.12) 


a=1 


assumes its maximum is the same as that for which 


[ 2g (%20|%1a, Xsa) ] 
a=1 
assumes its maximum. It follows that from the maximum-likelihood 
estimators of «, a, b, c [see (2.9)] the maximum-likelihood esti- 
mators of o2, pi2, pos, M2 can be obtained [see (2.11) and 3, p. 143]. 
The maximum-likelihood estimators of +, a, b, c, are,* respec- 
tively: 


o? = |ai;|/ (Na22) = (de2/n) [4 + Orie + Gres] , 


@ =—th2/Ge2 =—(8/ti) Van/an } 
b =—trs/dx =—(9/%) VOn/Oa , (2.18) 
c = X,—axi— bX; = X_ + Xi (v/u) V de2/ Or: ; 


+ X2(y/u) Vde2/das « 


*See (e) and (c), p. 160, and (g), p. 162, in (3). 
































344 PSYCHOMETRIKA 


3. Estimators for o2, pis, pos, Me. From (2.18) and (2.11) it 
follows that the maximum-likelihood estimators of o2, piz, pos, Me 
are, respectively: 


o> = { (doo/nii?) [& D + 0%, 62 (n/a) + 0°s 92(%/dss) 
i 2pis 0,030 9 (N/V G11033) ] }°/? 
= (Vde2/n) (4) Vk, say, 


prs =—(1/VE) [8 01/(Van/M) + G pis 03/(Vaas/n)), (3.1) 
pas =—(1/VE) [9 os/(Vdas/) + 5 prs ox/( Van) ) , 
ta = Xz + (0/it) (X1— m1) Vee Gr) 

+ (g/t) (X2— ms) Vde2/ (dss) « 


[See (2.4), (2.5) ]. 

The results given in (3.1) constitute a-.solution of the “first 
half” (so to speak) of the problem stated in Section 1. By a special- 
ization of these results, estimators of parameters in a certain trun- 
cated bivariate normal distribution can be obtained. Specifically, let 
fa(%1, X2) be the density function obtained from (1.1) by integrat- 
ing out x;. It can be shown that 


fa(%1,%2) = [2 20, o2 d]* 8“? exp {—(1/2s) [ (4: — m1) ?/o*, 
+ (%2—mMe)?/072 — 2 pie (Xi —M,) (%2—Mz) /o1 02) ]} , (3.2) 
(%,Sa%1< + o;—0o <%<+0). 


fa(%1,%2) is a truncated bivariate normal density function in which 
[see (1.6) ] the values of d, m,, o, are known and the values of m., 
62, pi2 are unknown. Let 0'n(Xia, X2a) (2 = 1, ---, m) be a random 
sample from f2(%,,%2). It can be shown that by setting 7,; = 72; = 0 


in g2, pi2, M2 [in (3.1)] the resulting expressions are maximum-like- 


lihood estimators, o2 , pi2, M2 Of o2, pi2, M2, respectively, in fa(x1,%2) ; 
thus, from (2.5) and (3.1) it follows that* 


o2 = { (Q22/n) [1 — 7712 + 0747712 (n/dx1) ]}? ’ 
ps T1201 (N/Ay1) 7? [1 — 1742 + 0717742 (N/du) J, (3.3) 
M2 = a: — Tas (X, — M1) (d22/Ar1)™?. 


* For large n the approximate variances and covariances of Vn(m,—m,), 
Vn(¢,—0,), V2(P,>—P.) are given in Section 4, 























D. F. VOTAW, JR., J. A. RAFFERTY, AND W. L. DEEMER 345 





The expression for pi2 is given in [2, (li), p. 23] and [1, (2), p. 2]. 


It should be noted that in general sai and pi2 are not equal. 


In the case that the range of X, is (—wo <a < 1,,) (see the 
last paragraph of Section 1), the estimators of mz, o2, pi2, pos are 


Me, 02, pi2, pos given in (3.1). For the corresponding truncated bi- 
variate normal distribution [given by (3.2) with (a, 22% <t co ) 
replaced by (— 0 < % < 1, )], the estimators of mz, o2, and pie 


are M2, 2, pi2 given in (3.3). 


4, The Asymptotic Variances and Covariances of the Estima- 
tors. Under very general conditions on a distribution function the 
maximum-likelihood estimators of the parameters (whose values are 
unknown) have a joint distribution that is approximately a normal 
multivariate distribution when the sample size is large, (3, p. 139). 


It can be shown that when n is large &, = \/n(m2. — m2), & = 


Vn (oo — o2), § = V1(pi2 = Piz)» and & = V (pos as pos) have a 
4-variate normal distribution approximately, where each of,the 4 
means equals 0 and the variance-covariance matrix, say \|C7 » (p.¢= 
1,2,3,4), depends on f4(%1,%.2,%;) [see (1.1)]. C” is the approximate 
covariance of & and &, (p # q); C” is the approximate variance of 
Ey 

Application of the theory in (3, section 6.24) yields ||C7|| given 
in (4.1), where the definitions of the symbols used are given in (1.4) 
and (2.2). 

The approximate variances and covariances (also correlations) 








associated with mz, o2, pic, and pos for large n can be obtained very 
easily from (4.1) ; for example, the approximate variances of p,. and 


A 


pos are, respectively: 


o = = C3/n = [D/ (2u?vm) ] {2us? + p?12(D + 2y?) va} , 


o pos = C#/n = [D/(2u?vn) ] (4.2) 
X {2uz? + va[2u(D + t p22) — D p23]}. 


and the approximate correlation between p,. and po is 











PA NZ 


PA NZ 


PA NZ 





{[%%.9 @— (7972+ G)ng)]’a + -enzI}q 


Pa NZ 





PY NZ 








[(2n dz + 8d) a + zn 4d Z—] Go 





[ (zn +0 Z + *d 7) Pa *d + 28nZ]qQ— 


PA NZ 


[?a(-AZ + d)*9 + nz] G 





PL NZ 


[ (Az + q)?a — sng] *4 q *0 





Pan 


S Par g *o— 


= ||oa9| 


VUTALAWOHOASd 


[ (2nd Z + 0 )’a + 2nd Z—] Go 


PA NZ 





Pan 


zy qe 


Pan 





[(-AZ + @)?a — snz] *d q *o 


PA NZ 


sy (2 


Pan 





[(cAZ + )?a + 9 nz] q *0 


Pay zig ad to 


Pan 





(?Y + 1) d*" 























D. F. VOTAW, JR., J. A. RAFFERTY, AND W. L. DEEMER 347 





C*/(VC#C#*) = — [2usz + pi2 Va(D p23 + 2 pis UZ) J 
X {2uz? + vg[2u(D + t p12) — D p23] }? (4.3) 
X {2us? + p*12(D + 2y?) va}-. 


Thus the solution* of the “second half’ of the problem stated in Sec- 
tion 1 is given by (4.1). 

The approximate variance-covariance matrix of \/n(mz — mz), 
Vn(o. — o2) and \V/n(pi2 — pi) for large n is the upper left-hand 
3-order principal sub-matrix of the matrix ||C4| with pis and pes re- 
placed by 0 [see (4.1) and (3.3)]. 

When the range of X, is (— © < % < @,,) instead of 


(x, < 4, < + o) (see the last paragraph of Section 1) , the asymp- 


totic variances and covariances of m2, 02, pir, pos can be obtained 
from (4.3) very easily, as follows. Let ¢, = Vn(m2 — m2), é'2 = 


Vn(o02 — a2), &'3 = V1 (pre oa piz)s = V 2(pes a pos). In (4.3) set 
d = d' replace Ri by (—R,z), and replace z; by (—2a). For large n 
the approximate variances and covariances of &';, &'2, &’3, &’4 are then 
given by the corresponding expressions in (4.3) for the approximate 
variances and covariances of €,, &, &3, &. 

When the distribution is a truncated bivariate normal distribu- 
tion [given by (3.2) with %, 5%, < + w) replaced by (— 0 <2, < 


Wy, )], approximate variances and covariances of \/n(m2 — m2), 


Vn(o2 — a2), \V/(pi2 — pz) for large n are given by the approximate 
variances and covariances of &',, &'2, &';, with pi; and pe; replaced 


by 0. [see (3.3) and the last paragraph of Section 3]. 


*Thanks are due Mr. Richard Gardner for many of the calculations re- 
quired to obtain (4.1). 


REFERENCES 
1. Deemer, W. L., Horst, A. P., Thorndike, R. L., and Whitney, A. G. The 
correction of correlation coefficients for restriction of range. Psychological 
Section, Office of the Surgeon, Headquarters, Army Air Forces Training 
Command, Technical Bulletin Hq. 48-6, 24 November 1943. 
2. Pearson, Karl. Mathematical contributions to the theory of evolution. XI. 
On the influence of natural selection on the variability and correlation of 
organs. Philos. Trans. Roy. Soc., 1908, Series A, 200, 1-66. 
Wilks, S. S. Mathematical statistics. Princeton, N. J.: Princeton Univ. 
Press, 1948. 


co 


Manuscript received 11/29/49 























PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


THE JOHNSON-NEYMAN TECHNIQUE, 
ITS THEORY AND APPLICATION 


PALMER O, JOHNSON 
UNIVERSITY OF MINNESOTA 
AND 
LEo C. Fay 
STATE TEACHERS COLLEGE, CORTLAND, N. Y. 


The theoretical basis for the Johnson-Neyman Technique is 
here presented for the first time in an American journal. In addi- 
tion, a simplified working procedure is outlined, step-by-step, for 
an actual problem. The determination of significance is arrived at 
early in the analysis; and where no significant difference is found, 
the problem is complete at this point. The plotting of the region of 
significance where a significant difference does exist has also been 
simplified by using the procedure of rotation and translation of 
axes, 


Modern statistical procedures are primarily concerned with two 
basic problems—determining significance and estimating parameter 
values. The Johnson-Neyman technique may be used for both prob- 
lems; however, its primary contribution is the determination of sig- 
nificance of differences in group performance. The ordinary tests of 
significance of difference, e.g., Fisher’s ‘‘t’’ test and the analysis of 
variance and of covariance, test whether a statistically significant 
difference exists between the mean performances of the groups being 
compared. 

The Johnson-Neyman technique makes a unique, additional con- 
tribution in that it defines the population, in terms of the control 
variables, for which the conclusion of significant difference of mean 
performance may be held. In the problem outlined here the conclu- 
sion that a significant difference exists in social studies achievement 
between good and poor readers is valid for all populations defined 
by the region of significance. 

The purpose of this paper is to present a theoretical considera- 
tion of the technique as well as a simplified working procedure for 
determining both whether a region of significance exists and, where 
it does exist, how it may be more readily plotted. The theoretical 
background has appeared only in a foreign journal,* and the work- 


*Johnson, Palmer O., and Neyman, J. Tests of certain linear hypotheses 
and their application to some educational problems. Statistical Research Mem- 
otrs, 19383, 1, 72-98. 


49 











350 PSYCHOMETRIKA 


ing procedures have hitherto been described only in the lectures of 
one of the authors in an advanced course in statistics for graduate 
students. 


Theoretical Considerations 


The problem considered is that of determining whether some 
difference in conditions, say A and B, influences a certain numerical 
character of individuals who have been classified into two groups, 
G, and G.. One or more characteristics, called basic characters, are 
used as the basis for comparing the potentialities of the two groups 
for achieving the experimental character or criterion. Both the con- 
trasting conditions A and B as well as the difference in the match- 
ing characteristics of the individuals or groups may influence the 
value of the experimental factor. It is necessary, therefore, that we 
adjust the difference in the means (or in other statistical functions 
of the observational data) of the groups for such influences upon the 
experimental factor which might have resulted from the inequalities 
in the basic characters in the two groups. 

Statistically the problem is that of testing a linear hypothesis. 
The single principle of the likelihood ratio is then used for testing 
the linear hypothesis. The most general problem calling for the ap- 
plication of analysis of variance and covariance is also expressed in 
the form of testing linear hypothesis. However, with the Johnson- 
Neyman technique the analysis of the problem is carried further, in 
that a “region of significance” is established. If this region of sig- 
nificance is found to exist in a particular problem, it becomes pos- 
sible to specify all the systems of values of the basic characters of 
matching for which the null hypothesis involving such systems would 
be rejected. 

We first outline the procedure of testing the linear hypothesis 
to which form our practical problem is translated. We then describe 
the method of setting up the region of significance.* 

Testing the linear hypothesis. First it is necessary to specify 
an appropriate notation and to set up the statistical hypothesis to 
be tested. The basic characters for determining the comparability 
of the two groups take the part of independent variables, and the 
experimental character that of a dependent variable. There may 
be any number of independent variables. Here we shall consider two, 
which we shall denote by X and Y. The values of the experimental 


For the detailed mathematical formulation and solution of the problem the 
reader is referred to the original article cited in the preceding footnote. 




















PALMER 0. JOHNSON AND LEO C. FAY 351 


or observational character for the two groups under comparison will 
be specified by Z, and Z,.. 

The basic assumptions may be set forth by stating that the 
population means of Z, and Z, of individuals with fixed values of 
X and Y are some function of these values, X and Y, i.e., 


E(Z;)=fi(X,Y); E(Z2) =f2(X,Y). (1) 


We wish to test the hypothesis, H(X’, Y’) that the two means 
are equal, or 


fi(X', Y’) = f2(X’, Y’). (2) 


It is assumed here that f, and f, are polynomials of the first 
degree with respect to the characters of matching. We may, there- 
fore, write: 


E(Z,;) =A, +A: X+A,Y. (3) 
E(Z2) =Bo+B,X+B.Y. (4) 


The coefficients, A’s and B’s, play the role of parameters. The 
population mean of each observation is a linear function of these 
parameters. 

The hypothesis H(X’, Y’) to be tested is expressed by the equa- 
tion 


6(X’, Y’) =A, — By + (A. — Bi) X' + (A2— Bz) Y’=0. = (5) 


In this discussion we shall not use the hypothesis H (X’, Y’) for 
a particular system of fixed values, say #, = #, = X'’, 7, = 72. = Y’, 
but we shall try to determine all the systems of values X’, Y’ for 
which the hypothesis H(X’, Y’) should be rejected. Geometrically 
each system of X’ and Y’, or each hypothesis H(X’,Y’), will be rep- 
resented by a point on the plane with coordinates X’ and Y’. If there 
are systems of values of X’,Y’ for which H (X’,Y’) should be rejected, 
generally it will be possible to specify a region, say R, which we 
shall call the region of significance. For every point representing 
H(X',Y’) in this region, the hypothesis H (X’,Y’) should be rejected. 

Before indicating the method for finding the region of signifi- 
cance, it will be necessary to collect here all the formulae of use in 
our discussion. 

The criterion appropriate to test the linear hypothesis H (X’,Y’) 
is obtained by the following steps: 

1. Obtain the absolute minimum of a sum of squares desig- 
nated by S?,. This is readily obtained from the theory of multiple 
correlation and is given by the relationship 














PSYCHOMETRIKA 


LP gy — 2921 — Pye + 20 ay’ Teer Tyer 





(6) 


1 = 1 oy" ate 1 p20 = 7 y22 > 2P zy’ * Toze Tyz2 





2 
1—*#,,: 


where 7, and n, denote the number of individuals subjected to con- 
ditions A and B, respectively; on, 1221, Ty, and the 7’s are the 
standard deviation and coefficients of correlation calculated for the 
group subjected to the condition A; o, Tr, Ty and r”’s, the corre- 
sponding values for the other group subjected to B. 

2. Calculate the relative minimum of a sum of squares speci- 
fied by S*, where 





S?,=S*%, + S% with S%,20 (7) 
or in a form suitable for numerical computation as 
S S? i (8) 
“¢ = gon ; " 
r+ 
where 
oe 221 Voy’ Tyz. Cz - = 
Pst, +. Ca) 
l es poy Or 
yz ‘ey’ 4221 @ * = ¥ 
+ ~ (Y'—Y;) 
i—#,, Sy 
(9) 
T pve Voy’ 7 22 G 
— Z, + ——_—— .— (x —X.) 
l— 7, o 
Vyz2 — Voy! Vez2 Oz2 . 
ee ee 
1—7 Cy 


where Z,, X,, and Y, are the means for the group subjected to A; 
Z,, X2., Y2 are the means for the group subjected to B; and the 
designations for the other terms correspond to those given in Step 1. 
Also 





1 1 X —X,)? X’'—xX,)(Y'—Y, 
Pree a+ (S Ee ig. Se SO --.1) 
nN, | oe Tx ; Ox’ Oy’ 

(Y’— Y,)? ) 1 | 1 ( (X,—X,)? 
4. eee 1 5 ee pa ee ae Ei! Oe 
oy’ Ns | i—9e Cp) (10) 
a (X’ — X.) (Y’ — Y.) (Y’— Y.)? ) 
oo SP py? a ea nar e 
Op’ Cy"! oy"! | 














\e 











PALMER 0. JOHNSON AND LEO C. FAY 353 





It may be noted that P + Q depends only upon the basic char- 
acters of matching, the X’s and the Y’s; also that it attains its mini- 
mum value when X, = X, = X’, Y; = Y. = Y’, where P+ Q= 
ae 


N, Me 


3. Form the likelihood criterion L by dividing S*, by S?,, or 


Sa 
L=—. (11) 

ws 
Tables of the incomplete beta function* are entered with p = 
4(n — s) and gq = 4(*) and the probability value corresponding to 
L, obtained; 7 = the number of equations required to express the 
hypothesis tested; = the number of individuals (i.e., here 


2, + NM); 8 = the number of independent parameters of which the 
population mean of the experimental character is assumed to be a 
linear function with known coefficients. It is seen that L is a ratio 
of one sum of squares to two sums of squares: 


Xo 
L=——— (12) 
Hot #1 
where the number of degrees of freedom are for 772 = n — s; for 
y= r. The distribution of L is 
P(L) = Constant L,""71(1— L,)"=> (13) 


with the two ends fixed at 0 and 1. L, must be between 0 and 1, 
since the relative cannot be less than the absolute minimum. The 
smaller L, the less likely the hypothesis is to be true. The prob- 
ability that, if it were true, as small a value of L could be obtained 
through random errors of sampling, can be obtained from the table 
which gives the values for the .05 and the .01 levels of significance. 

It is to be noted that Fisher’s Z tables or Snedecor’s F' tables 
may also be used. 


Determining the region of significance. Denote by « a number 
such that if the value of P found from the Tables of the Incomplete 
Beta Function is less than or equal to <, we reject the hypothesis 
tested, H (X’',Y’), whereas if P > « we accept H(X’,Y’). This value, 
which may be arbitrarily chosen by any research worker (e.g., 


*Pearson, K. Tables of the incomplete beta function. London: Biometrika 
Office, University College, 1934. 































354 PSYCHOMETRIKA 


e= .05, .01, etc.), is the level of significance. Denote by Le the value 
of L corresponding to the chosen level of significance. Now the re- 
gion R will be defined by the inequality 


S*, = Le S’, (14) 


which may be represented as follows: 

















Le 
aes b° + °— bd, x" + °,— Ds yy’ 2 
[a% (a ) (a le 

| 1 f (X’'—X,)? -" (X'— X,) (Y’—Y;) ‘ (Y’—Y,)? 

se, (1 — *,,-) Oo"; a Oz’ Oy’ o*y: 
1 | (x =2,)* ‘ (X’'—X.)(Y’'—Y.) _ (Y’—Y,)? 

~~ 47 gy" v 
n;(1 — Pay )L re Oz’ Ty’’ oy 


 S*,=S% + S% 

Le S*, + Le S*» —S% 20 

mm J’. | LeS—S%(1—Le) >0 (15)* 
Le 

1—L- 








S*,—S*, > 0. 





With this inequality, the region R can be determined in any 
particular case. The only variables are X’ and Y’. The curve limit- 
ing the region of significance, say 7, will be a quadratic. 

It may be of interest to draw some general conclusions concern- 
ing the region R. Let us first write formulae in the following abbre- 
viated form: 


F?(X’,Y')w— (P+ Q)S*%. 20, (16) 


Le 
where w = —— 
(1 — Le) 


is always positive with the minimum, say M , which may vanish only 
if X,— X, and Y, = Y.. It is also noted that the curve correspond- 
ing to the equation 


(P + Q)S*,=any positive constant (17) 





. It is to be noted that the quantity (P + Q)S*% 


is an ellipse, say E , with its center at the point where P + Q is ata 
*The a’s and b’s are continuous variables. Their values minimizing 
x? = S?= 2, (2, — a, —a,”% —a,y)? 7, 2p (%— by — b,x — bay)? 

are denoted by a°’s and b°’s. 


| 
| 

















PALMER 0. JOHNSON AND LEO C. FAY 355 


minimum. This point is called the center of accuracy and is desig- 
nated by C,. This is the point in which the variance of the best 
linear estimate F attains its minimum value. 

Let us now consider the locus of points (X’, Y’) in which the 
estimate, F(X’,Y’) of 6(X’,Y’) is equal to zero. This condition corre- 
sponds to the equation 


F(X’',Y’) =a%— b9, + (a°: — b°:) a’ + (a — b2)y=0, (18) 


and the line represented by this equation is called the line of non- 
significance and is denoted by. F. 

It is observed that the quadriatic 7, limiting R does not cut F. 
Actually in each point of F, the left-hand side of Equation 18 is 
negative. Accordingly, the entire line F lies outside the region R. 

We now transform the coordinates, taking as the new axis of 
ordinates, say 07, the line of non-significance F and as the axis of 
abscissae, 06 , the diameter of the ellipse HZ , conjugate with the di- 
rection of F. It is immaterial which direction of the new axis of 
ordinates is considered positive. The direction of the new axis of 
abscissae may be assumed to be determined by the sign of F(X’, Y’) 
in Equation 18. The inequality (16) then has the form: 


wk? & —A(E—&)?— Cy? —H > 0 (19) 


where K?, A, C and & are independent of é and 7, and &, specifies 
the new abscissae of the center of accuracy C,. 
If wk? # A, then the inequality (19) may be written: 

A && ) R wk? A &% 
———*_ ) — Cr? > H + ———— 
we—ay wk? — A 7 
and the region R either lies inside an ellipse or is limited by a hyper- 
bola. 

The curve T is an ellipse whenever wk? < A. If the right-hand 
side of (20) is positive, the ellipse is imaginary. Then, no hypothesis 
H(X’,Y’) should be rejected. Otherwise, the ellipse is real with its 
center at the point, say U , with coordinates 


A & 
EE, 9, = 0. (20) 
wk? —A 


It is to be noted that &, has the same sign as & and that 
[| > |&|. It follows that U lies on the new axis of abscissae, so 
that the center of accuracy C, falls between U and the intersection 
of the axis of abscissae with the line of non-significance. 


(wht —A)(é + 














356 PSYCHOMETRIKA 


In the case when wk? > A, the curve T which limits R is an 
hyperbola. Finally, if wk? =A , the curve T limiting R is a parabola. 

It will be observed that the center of accuracy C, may or may 
not lie within the region of significance. It is noteworthy that the 
region of significance may be comprised of one or of two parts. Where 
the region is composed of two parts, we reject the hypothesis 
H(X’,Y’), stating that ¢(X’,Y’) = 0, in support of the alternative 
hypothesis, sometimes assuming that 6(X’,Y’) is positive and some- 
times that it is negative, in accordance with the sign of F. If the 
region of significance is composed of one part only, then, whatever 
X’ and Y’, we either accept the hypothesis tested or reject it in fa- 
vor of only one kind of alternative. 

We have considered the comparison between two groups assum- 
ing that the preliminary data concerning each individual were the 
values of X and Y. If we had additional information on a third vari- 
able H , the same method of approach could be followed to obtain a 
more delicate comparison.* Derivations could be extended to &k in- 
dependent variables. For example, v,, v2, --- , Vx, could supplant 
X,Y,H and the observation on each called v;;. 

It should be noted that the likelihood-ratio method, as commonly 
employed, for any fixed set of parameter values provides a critical 
region W{X , &} (in the sample-space of Y’s) for the rejection of the 
hypothesis H,. Now, given a sample point Oy, consider the set 
V{Y,é} in the X-space, consisting of all points X for which W{X,é 
contains O,. This is the so-called “region of significance.” Logically 
it might be thought of as a “no confidence” region, connected with a 
variable null hypothesis. Thus, if H, is true, then the probability 
is less than e that, if we draw a sample and determine V{y,é}, the 
region V will contain the point X. 

The evaluation of the region of significance will now be made 
specific for two independent variables both as an illustration and 
to obtain specific results. 


Working Procedure 
The problem presented here is a comparison of the social studies 
achievement of 90 pupils who excell in the ability to predict the out- 
come of given events with 90 pupils who perform poorly in predict- 
ing the outcome of given events. The null hypothesis to be tested 
is that no difference exists in mean social studies achievement be- 
tween superior and inferior readers when the effects of chronologi- 


*Johnson, Palmer Q., and Hoyt, Cyril. On determining three dimensional 
regions of significance. /. exp. educ., 1947, 15, 203-212. 







































PALMER 0. JOHNSON AND LEO C. FAY 357 


cal and mental ages are controlled. The statistical problem is to test 
this hypothesis designated as H(X’,Y’) and to plot the region of sig- 
nificance if the hypothesis is rejected at a specified level. 

Basic to the working of the problem is the computation of the 
necessary sums, squares, and cross-products. These data are present- 
ed in Table 1. 


TABLE 1 
Basic Data for the Application of the Johnson-Neyman Technique to 
Achievement in Social Studies for Superior and Inferior 
Performers in the Ability to Predict the Outcome 
of Given Events* 











Superior Performers Inferior Performers 
N, = 90 N,. = 90 
2=Z, = 2858 =Z, —= 1522 
2Z,2 = 95,592 =Z,,? = 30,974 
2X,— 6117 2X, = 3752 
=X ,2 = 452,017 >X,? = 188,944 
=Y,— 1564 =Y, = 1834 
ZY? = 29,834 =Y,? = 42,086 
2Z,X, = 200,788 2=Z,X, = 69,998 
2Z,Y, — 49,288 =Z,Y,—= 31,425 
2X,Y,—= 105,578 =X, Y, = 76,902 


Z = social studies achievement 
X = mental age 
Y = chronological age 





*To simplify computations all values used in this analysis were changed into code scores. 
The raw scores were reduced to code scores according to the following formulas: 


Mental Age: Raw Score — 100 
Chronological Age: Raw Score — 120 
Social Studies: Raw Score— 30 


Step One. Solve the following set of equations for A, , B,, C,: 


ALN, + Bi>X; + CiSY, = 37; 
As X, +B SXZ2 + Cc 3X,Y.=—34,:4: 
AS Y, + BS4i,Y.+ CO, 35Y2 =3S21Y; 


In our example: 
A, = 21.2345 B, = .1785 C, = —.0937 


Step Two. Solve the following set of equations for A., B., C:: 
AN. + BX. +C.DY¥2 = DZz 


Aad. Xo + BoD X22? + CoD X2Y2 = TZ2X2 
AD Y2 + Bo>X2Y2 + C2SY22 = ZY, 


































358 PSYCHOMETRIKA 


In our example: 
Az = 7.17097 B,= .2004 C.= .0680 


Step Three. Solve the following sets of equations for A, B’,, C’,, 
B’,, C's: 

BYSXZ + CS XiY. = >27,.X%: —ADSAi 

B>X1Y; - C1. ZY? =2>7Z1Y, —AZr, 

BX? + C'S X2Y2 = FZ2X2 — ATX 

BDX2Y2 + C'2DY2? =TZ2Y2 —AXY2 

BY>X, + C’DY1 + Bed Xe. + CD V2= 372, + DZ. —A(Ni + Ne) 


In our example: 


A = 12.12203 B’, = .24580 C’,= «147138 
B’, = .15551 C', = —.06503 


Step Four. Determine the relative and absolute minimums. The ab- 
solute minimum, S?, , and the relative minimum, S?, , may be obtained 
by substituting in the following formulas: 
S?,= 3272 + 3272 — Aids Z, — Bi 7X, 
a C1572, Y, tis Az> Ze aig B.>Z2X2 7 C2 22Y2 
S?,= 52 + 527.2 —A(SZ, + SZ2) 
ie BY 2X1 cieeei C'.>2,Y: —" B'S: 7X2 ae C’2>Z2Y2 
The relationship between S*, and S?, may be represented as follows: 
S?, = S*, + S*%. S?,is, therefore always larger than S?, . 
In the present case: 


S*, = 7576.7110 2 = 8024.25353 
Step Five. Determine whether a significant difference exists for any 


system of (X’,Y’). To do this determine the value of u and its sig- 
nificance. 





S600 
S?,’ n—s’ 





“= p=43(n—s); e=}r, 
where vn is the total number of observations; s is the number of esti- 
mates of parameters used in determining the absolute minimum; and 
r is the number of equations used in defining the hypothesis. 

In our example: 


u= .9442; t = 3.4483; p= 87; q=.5. 








V2) 


ib- 
ed 


'S: 


ny 


‘i- 
id 











PALMER 0. JOHNSON AND LEO C. FAY 359 


By interpolation in Table V in the Johnson-Neyman article,* we ob- 
tain 
Uo1 = .9625 . 

Therefore we conclude that the value of wu is significant at the 
1% level and we reject the hypothesis that no difference exists in 
social studies mean achievement between superior and inferior read- 
ers. If no significant difference is found the problem is complete at 
this point. 
Step Six. Find the relative minimum corresponding to the hypothe- 
sis H(X',Y’). This hypothesis may be specified by the following equa- 
tion: 

(a, + b,X’ + ¢, Y’) — (a, + b.X’ + CY’) =0. 


By using the method of maximum likelihood, we obtain the follow- 
ing sets of equations: 


aN, +6:35X%, +4>Y:. =s4,ta 
a>X, + | > a + C5 AY: = 37,4, + aX’ 
a>dY, + b>XiYi+6D>Y2, =—=>2,Y1+ aY’ 
and 
d2N. + b2> Xe + ced Yo =32Z.—a 
d2>X2 + b25X» + CoD X2Y2 =>72X2 — aX’ 
Garis + be> Ae Ys + Ca> Y*s = DZ2Y2 — ay’. 
Solving for a,, 6, , and ¢: 
a@,=a®, + a(key + LX’ + mY’) 
b, — b°, + a(Kie. + bom + M2Y') 
C, = 6°, + alka + eX’ + mY’). 
Solving for a2, b2, and ¢2: 
a2 = 0°, —a (Koy + id’ + M21Y') 
b. = b°, — a (Kee + i,X’ + M2 Y') 
Co = 0°, — a (Keg + LagX’ + Me3Y’). 
Where a’’s, b°’s, cs, k’s, I’s, and m’s are constants; the a’s are the 
undetermined constant multipliers. Let us define: 
P= (ky + bX’ + mY!) + (hae + bia X’ + me Y')X 
+ (Kis + 3X’ + m:Y') Y’, 
Q = (ha, + bX’ + M2Y') + (Keay + baa X’ + Meg Y’) X’ 
. + (Hees +, les + MiesY!) Y’, 


'*Johnson and Neyman, op. cit., p. 92, 








360 PSYCHOMETRIKA 


F(X’, Y') = = {(a°, + 09°,.X’ + C*,Y’) — (a°. + 0°.X’ +C%Y')}, 
F2(X’,Y’) = {F(X’,Y’) }’. 
In our example: 
a, = 21.2345 + a(.269915797 — .002018747X’ — .007016472Y’), 
b, = .17847 + a(—.0020138747 
+ .000027721969X’ + .0000074565Y’), 
C, = —.09372 + a(—.007016472 
+ .0000074565X’ + .0003745978Y’), 
a, = 7.17097 — a(+.104033932 
— .00070700196X' — .00420413844Y’), 
b, = .20036 — a(.00070700196 
— .000030783138X' — .000002900822Y’), 
C. = .06804 — a(—.0042041344 
— .000002900822X’ + .0000585051Y’), 
P + Q= 373949729 — .005441498X’ — .0224412128Y’ 
+ .000058505X”" + .000586842Y” + .0000093114X'Y’, 


F(X’, Y’) = 14.06353 — .02189X’ — .16176Y’. 


When made equal to zero, F(X’,Y’) is the equation for the line of 
non-significance. 


F? (X’,Y') = 197.782876 — .615701343.X' — 4.549833 Y’ 
+ .0004791721X" + .003540926X'Y’ + .0261663Y". 


Using the value of u, already obtained in Step five, it is pos- 
sible to obtain the value of w: 
u 
1—u 


In our problem W (1%) = 25.6667. 
Step Seven. Plot the 1% region of significance.* 





1. Equation for the region of significance: 
Boundary ®(X’,Y’): F?(X’,Y’)w — (P + Q)S2,=0. 


In our problem 


© (X',Y’) 0. = 2248.1247 + 25.42564.0" + 53.25138Y’ 
— 43.0977X"* + .8382885X'Y’ — 3.774727Y"*=0. 


*The plotting of the region of significance is a problem of analytical geom- 
etry involving rotation and translation of axes. See, for example the working 
procedures outlined by: Wilson, W. A., and Tracy, J. I. Analytical Geometry. 
New York: D. C. Heath and Companv, 19387, pp. 180-148. 





































PALMER 0. JOHNSON AND LEO C. FAY 361 


As the formula now stands the region can not be readily plotted 
because of the X’Y’ factor. To overcome this difficulty the original 
axes are rotated through the point of origin until the term X’Y’ is 
eliminated. 


1 
New X axis: Y=1X. New Y axis: Y=—~— X. 
General equation: 
aX*+ 2hXY + bY? + 2gX + 2fX +c=0. 


Where 4 is the positive root hi? + (a— b)A —h =—0, or in terms 
of the working formula, 


_—(a—b) + Va=0 F 








2h 
In our problem: 
a = —.4380977 b = —3.TT4727 c = 2243.1247 
f = 26.62569 g = 12.71282 h = 419144 
a— b=3.34375 
—3.343875 + V11.180664 + .702730 
A= = .123338. 





838289 


New X axis: Y = .123338X New Y axis: Y =—8.10773X 
Let A and B represent the new coefficients of X’ and Y’ respectively. 
A+B=a+band AB=ab—h 

In our problem: 


AB = 1.451142 
A + B=—4.205704 


Find the value of B by substituting in the formula: 





pet) * Va—b? +4?  —4,205704 + \/11.180664 + .70273 
2 7 2 
=— 37928 or —3.82642. 


2. Center of region, X”,, Y"»: 








The original formula for the region has now been simpli- 
fied by removing the X’Y’ factor. However, four factors 
(X,Y,X?,Y?) remain. This formula can be further simpli- 











362 PSYCHOMETRIKA 


fied by removing the X and Y factors through the process 
of translation of axes. This consists of establishing new 
axes, parallel with the rotated axes with the center of re- 
gion (X”, Y”,.) as their origin. The center of region may 
be obtained by solving the simultaneous equations 


aX", +hY",+g=0 and 
hX",+ bY", +f=0. 


In our problem: 


—.430977X" + .419144Y”, + 12.71282=—0, 
.419144X”", + (—3.774727) Y”, + 63.52397 = 0, 
X”",= 11.58, 
"9 — 40.76. 


The basic formula for the region has now been simplified to 
where it consists of X* and Y? plus a constant. However, 
before determining boundary values of X and Y it is neces- 
sary to determine the shape of the region. This can be done 
by examining the relationship between the coefficients of 
X?,Y?, and XY in the basic formula for the region. 


If h? — ab is less than zero the region is an ellipse; if it is equal 
to zero the region is a parabola; and if it is greater than zero the 
region is a hyperbola. In our problem h? — ab is less than zero and 
the region of significance is, therefore, an ellipse. 


3. Equation of boundary of region: 


Since the shape of the region is an ellipse, the basic formula for 
the region simplified by rotation and translation may now be ex- 
pressed as 


AX? + BY? =—C’ 


where A and B are the coefficients of X’ and Y’ found after the axis 
rotation and C’ is a new constant value replacing c in the original 
formula. The value of C’ may be determined by substituting in the 
formula 


os D 
~ ab —h? 





where D is the determinant 


D= abe + 2(hfg) — bg? — af? — ch? 





363 


PALMER 0. JOHNSON AND LEO C. FAY 


“( @oubddijtu 
— Bis 40 |ene| jue 19d Guo ayy yO S4Bpoas sojsadns $40AD}) GITIOULNOD ATTVOILSILVIS 3YV S39V¥ IVINIW 
ONV TIVOISOIONOYHS JO SL193343 3HL N3HM SLN3SAZ N3SAID JO SWOOLNO SHL LOIGSYd OL ALINIGV 3HL NI 


SY30V3Y YOIMSINI GNV YOINSdNS JO LNIWZAZIHOV SZIGNLS IWID0S JHL JO NOSIUVIWOD YW-'] SENSI 











39V IVLN3IW 
Ove 022 002 O8| 091 Ov| Od! OOl 
“aii , ol 
ae x 
7 ie x x” . 1 oe x . . 
tie — a oe 
7 ¥* ee ee ia : ae peatr oan. , 
~ x 
ae . ; : 
aoundi1UBIS-uON yo eur — U i ET ae a 
Ssapd2ey sOLseJU] — - 
Sieposy solsadns — x 
vo —.9 
yo—- O8l 
\ 





TWOIDOTONOYHSD 


J9V 
























364 PSYCHOMETRIKA 


In our problem: 


D = 4454.426416 
C’ = 3069.600304 


Since the values of A, B, and C’ are known, it is now possible to 
solve for Y in terms of X for several points of the boundary of the 
region and to plot the region. 

Values of Y in terms of X may be obtained by changing the basic 
formula AX? + BY? = —C’ to 


cs Pe 
— 


and solving. 
in our problem: 


X= 0 5 25 50 80 85 90 
Y+ 28.32 28.28 27.21 23.55 13.0 9.3 0 





We now possess sufficient information to plot the region of signifi- 
cance graphically which is presented here as Figure 1. 


4. Center of Accuracy (point where the variance of F is a mini- 
mum). The value S2,(P + Q) is made equal to ¥(X’,Y’) and 
the partial derivatives for X’ and Y’ are taken from it. To find 
the center of accuracy, the partial derivatives are set equal to 
zero and the resulting equations are solved for X’, and Y’y. 

In our problem: 








ov 

Y2 = = —41.22866 + .88655X' + .07055Y'’ = 0 
oWV 

Y2 - - = —170.03058 + .07055X’ + 8.89266Y’ = 0 
oO 

X', = 45.26885 

Y’, = 15.5289 


Step Nine. Determine whether the region of significance favors su- 
perior or inferior readers: 


Values of points within the region of significance are sub- 
stituted in the formula for F(X’,Y’) and the formula is solved. 
If the results are positive, the region favors that group upon 
whom the parameters a,, b:, and ¢, are based; and if the re- 
sults are negative the region favors the group upon whom the 







































PALMER 0. JOHNSON AND LEO C. FAY 365 


parameter values a., b., and c, are based. In our problem the 
results are positive, and a,, b,, c, are derived from the superior 
readers. The region, therefore, favors the superior readers. 


Step Ten. The problem of estimation: 


Whenever the center of accuracy lies within the region of sig- 

nificance a significant difference exists between the mean per- 

formance of the two groups being compared. The best esti- 
mate of the true difference between the mean performance of 
the two groups can be obtained by substituting the values of the 

center of accuracy in the formula for F(X’,Y’). 

In our problem: 

F(X’) Y')) = 14.06353 — .02189 (45.26885) 
—.16176 (15.5289) = 12.82139 . 
It can, therefore, be said that the best estimate of the true difference 
between the means of superior and inferior achievers in reading is 
12.82 in favor of the superior readers. The sign has the same sig- 
nificance here as in Step 9. 

If the center of accuracy were outside the region of significance 
the best estimate of the true difference between the mean perform- 
ance of the groups being compared would be zero. 

By determining the standard error of the difference it is possible 
to set up the limits within which the true difference would lie for a 
given fiducial probability. The standard error of the difference is 
found by using the formula 


2 F2 
en (aaa 


where 


2, =absolute sum of squares, 
F? =best estimate of difference squared, 
= number of observations, 


s =number of parameters, and 
S?, = difference between the relative and absolute sums of 
squares. 


In our problem: 


ithe 7,576.711 (12.82)* 
sat 174 (448.54) 


The fiducial probability is .99 that the true population difference be- 








= 4.006 

















366 PSYCHOMETRIKA 
tween means lies within the fiducial limits 2.50 and 23.14. 


Interpretation of the Region of Significance 

The plotting of the superior and inferior readers indicates the 
nature of the present sample but does not enter further into the in- 
terpretation of the region of significance. Rather, the region of sig- 
nificance describes an area in terms of the control (matching) vari- 
ables for which the null hypothesis would be rejected; that is, the 
null hypothesis will continue to be refuted for all samples whose in- 
dividual or mean values on the control variables fall within the re- 
gion of significance. 

In our problem, for the population represented by this sample, 
superior sixth grade readers having mental ages ranging from 100 
to 229 months and chronological ages ranging from 120 to 167 
months are significantly better in social studies mean achievement 
than are poor readers within the same ranges on the control vari- 
ables. These limits are approximations, as the region of significance 
is not rectangular. The exact limits may be obtained by examining 
the boundary of the region in Figure 1. 

The unique contribution of the Johnson-Neyman technique, then, 
is that it defines the population for all systems of values of the con- 
trol variables, that is, the H(X’,Y’)’s, for which the conclusion of 
significant difference of mean performance on the criterion may be 
held. 


REFERENCES 


1. Bond, Austin D. An experiment in the teaching of genetics. Teach. Coll. 
Contr. Educ., No. 797, 1940. 

2. Bond, Eva. Reading and ninth grade achievement. Teach. Coll. Contri. 
Educ., No. 756, 1938. 

3. Clark, Ella C. An experimental evaluation of the school excursion. J. exper. 
Educ., 1943, 12, 10-19. 

4. Deemer, Walter L., and Rulon, Phillip J. An experimental comparison of 
two shorthand systems. Harvard Studies in Education, No. 28, 1948. 

5. Deemer, Walter L. An empirical study of the relative merits of Gregg 
shorthand and Script shorthand. Harvard Studies in Education, No. 22, 
1944, 

6. Fay, Leo C. The relationship between specific reading skills and selected 
areas of sixth grade achievement. Ph.D. Thesis, University of Minnesota, 
1948, 

7. Fisher, R. A., and Yates, F. Statistical tables for biological, agricultural 
and medical research. London: Oliver and Boyd, 1943. 

8. Fine, Henry, and Thompson, Henry. Coordinate geometry. New York: 

Macmillan Co., 1911. 














18. 


19. 


20. 


21. 


22. 





10. 


au. 


12. 


13. 


14. 


16. 


AE 





PALMER 0. JOHNSON AND LEO C. FAY 367 


Hansen, Carl W. Factors associated with successful achievement in prob- 
lem solving in sixth grade arithmetic. J. educ. Res., 1944, 38, 111-118. 
Hoyt, Cyril. Tests of certain linear hypotheses and their application to 
problems in elementary college physics. Ph.D. Thesis, University of Minne- 
sota, 1944. 

Johnson, Harry C. The effect of instruction in mathematical vocabulary 
upon problem solving in arithmetic. J. educ. Res., 1944, 38, 97-110. 
Johnson, Palmer O. The measurement of the effectiveness of laboratory 
procedures upon the achievement of students in zoology with particular 
reference to the use and value of detailed drawings. Proceedings of the 
Minnesota Academy of Science, 1940, 8, 70-76. 

Johnson, Palmer O. The increase in precision in educational and psychc- 
logical experimentation through statistical controls. J. educ. Res., 1944, 38, 
149-152. 

Johnson, Palmer O., and Hoyt, Cyril. On determining three dimensional 
regions of signifiance. J. exper. Educ., 1947, 15, 203-212. 

Johnson, Palmer O., and Neyman, J. Tests of certain linear hypotheses and 
their applications to some educational problems. Statistical Research Mem- 
oirs, 1936, 1, 57-93. 

Koenker, Robert H., and Hansen, Carl W. Steps for the application of the 
Johnson-Neyman technique—a sample analysis. J. exper. Educ., 1942, 10, 
164-173. 

Kolodziejezyk, Stanislaw. On an important class of statistical hypotheses. 
Biometrika, 1935, 27, 161-190. 

Neyman, J., and Pearson, E. S. On the use and interpretation of certain 
test criteria for the purpose of statistical inference. Biometrika, 1928, 20, 
175-240. 

Pearson, K. Tables of the incomplete beta function. London: Biometrika 
Office, University College, 1934. 

Peterson, Shailer. The evaluation of a one-year course, the fusion of physics 
and chemistry, with other physical science courses. Science Education, 1945, 
29, 255-264. 

Treacy, John P. The relationship of reading skills to the ability to solve 
arithmetic problems. J. educ. Res., 1944, 38, 86-96. 

Wilson, W. A., and Tracy, J. I. Analytical geometry. New York: D. C. 
Heath and Company, 1937. 


Manuscript received 9/14/49 
Revised manuscript received 5/5/50 

















PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


THE COMPARABILITY OF SCORES FROM THREE 
MATHEMATICS TESTS OF THE 
COLLEGE ENTRANCE EXAMINATION BOARD* 


DOUGLAS G. SCHULTZ 
EDUCATIONAL TESTING SERVICE 


Scores from three mathematics tests of the College Entrance 
Examination Board were examined in order to determine the effect 
on the scores of (1) choice of test, (2) amount of training in mathe- 
matics, and (8) recency of training in mathematics. Groups of can- 
didates were paired in a number of comparisons and matched by 
means of a regression technique which is described. On the aver- 
age, students of similar ability made comparable scores on the 
mathematical section of the Scholastic Aptitude Test and on the 
Comprehensive Mathematics Test. The scores of candidates who 
took the Intermediate Mathematics Test averaged substantially 
hither than those of comparable students who took either of the 
other two tests. A greater amount of mathematical training and 
more recent training were both found to be positively related to 
scores on the mathematical section of the Scholastic Aptitude Test 
and on the Intermediate Mathematics Test, but the effect of recency 
appeared to be less than one might expect. 


Introduction 


At its regular series of examinations administered during 1948, 
the College Entrance Examination Board offered three mathematics 
tests, each designed to be appropriate for students at a different 
level of proficiency. Each mathematics test was joined with the ver- 
bal section of the Scholastic Aptitude Test to form a “Program,” and 
these three combinations were given simultaneously in the morning 
every time a series was held. The “Program” chosen by any par- 
ticular candidate depended not only upon his background in mathe- 
matics but also upon the recommendations or requirements of the 
colleges to which he was applying. Many colleges asked the student 
to take the highest level test suited to his training. Yet a student 
who had just finished four years of secondary school mathematics 
might have taken the lowest-level test of the three if a college he se- 
lected required it of all applicants or permitted entirely free choice. 

Because the “Program” arrangement had just been introduced 
in December, 1947, and was therefore very new at the time, there 
was some uncertainty about how best to bring together, for an accu- 


*The author is indebted to Mrs. L. B. Plumlee of the Educational Testing 
Service for her extensive aid in carrying out this project. 


369 














370 PSYCHOMETRIKA 


rate and meaningful comparison, scores from diverse tests and back- 
grounds. Also, although the tests had originally been scaled so that 
scores from all three were presumably comparable for persons of 
similar ability, it had not been possible to make an over-all check. In 
order to obtain some experimental evidence in relation to these prob- 
lems, a study was initiated to determine the effect on mathematics 
test scores of (1) choice of “Program,” (2) amount of training in 
mathematics, and (3) recency of training in mathematics. 


The Tests 
Specifically, the three College Board tests involved were the fol- 
lowing: * 


1. Scholastic Aptitude Test, Mathematical Section (SAT-M): 
a one-hour test intended for students with less than two and 
one-half years of high school mathematics and for students 
not currently studying mathematics. 

2. Intermediate Mathematics Test (IMT): a ninety-minute test 
for students with two and one-half to three years of high 
school mathematics. 

3. Comprehensive Mathematics Test (CMT): a two-hour test 
for students with three and one-half to four years of high 
school mathematics, including trigonometry. 


Experimental Design 


The design of the experiment called for isolating groups of can- 
didates who were alike in choice of test, amount of mathematical 
training, and recency of training, and then comparing pairs of 
groups whose members differed in only one of the characteristics. 
For example, among those who had three years of secondary school 
mathematics and who took the Intermediate Mathematics Test, two 
groups coulc be identified: one composed of students who finished 
their last mathematics course in 1948; the other composed of stu- 
dents who completed their last course in 1947. Comparison of these 
two populations would provide an evaluation of the effect of recency 
of training, with amount of training and test held constant. By suc- 
cessive, appropriate pairings of groups in this manner, the influence 
of any one of the three factors could be examined while the other 
two were invariant. 

*Beginning in December, 1949, only the two Scholastic Aptitude Test sec- 
tions are being given in the morning. Tests in intermediate and advanced mathe- 


matics are now included as one-hour achievement tests which can be taken in 
the afternoon. 











at 
of 
in 
N= 


in 


ic 
h 








DOUGLAS G. SCHULTZ 371 


Full information as to the amount and recency of mathematical 
training was available from a questionnaire which had been com- 
pleted by all candidates who took College Board tests in the series 
of April, 1948. This form inquired about all the courses taken in 
secondary school. 

It was apparent, however, that another variable had to be con- 
trolled. In any comparison of the kind described above, the two 
groups could not be assumed to be equal in what might be called 
“basic mathematical ability.” For instance, those who finish their 
mathematics work a year before they take the College Board test 
might be inherently better or poorer students than those who finish 
the same year they take the test, even though both groups have had 
the same amount of training and take the same test. A lack of fun- 
damental equality in skill might overshadow any difference in the 
test scores due to recency of training, unless the inequality were 
taken into account. 

A great deal of thought was devoted to the selection of a meas- 
ure of this capacity for mathematics. It was finally decided to use 
the student’s mark in the mathematics course (usually elementary 
algebra) which he had taken during the ninth grade of school. The 
fallibility of this index was fully realized, since it is well known that 
grades cover many more elements than just ability in the subject and 
that they vary greatly from school to school. On the other hand, the 
ninth-grade mathematics mark is indicative of the student’s readi- 
ness for advanced mathematical work and is based on a similar 
course among schools. Scores on a standardized mathematics test 
administered before the students entered secondary school would 
probably have provided a better measure. But because no such scores 
were available for the graduates of June, 1948, and because no re- 
sults of the study could have been determined for four years if such 
a test were administered to students entering secondary school at the 
time, the mathematics grade appeared to be the only practicable 
means of equating groups on this variable. 


The Subjects 

The subjects included in the study can reasonably be assumed 

to be representative of the entire population which took the April, 
1948, series of College Board tests. From a list of all schools which 
sent candidates to this series, a large number were selected at ran- 
dom and were asked to submit ninth-grade mathematics marks for 
their pupils. A large proportion responded. Of the grades received, 
only those which were given by the school in percentage terms were 



















372 PSYCHOMETRIKA 


utilized. No adjustments were made for any school-to-school differ- 
ences in grading systems or standards. However, because of possible 
variation in the general educational situation as well as in grading, 
computations were made separately for public and independent 
schools. 


If students from a particular school were to form a major part 
of any group, a difference found in a comparison with another group 
might be attributed to the teaching characteristics of that school. 
Therefore, a check was made of the school distribution in the sam- 
ples finally selected. This showed that the smallest number of insti- 
tutions represented in a group was 57 and the number ranged up to 
156. No school contributed over 11% of the cases to any group, with 
the exception of one in which 23% came from the same institution. 
Even in the latter instance, however, the remainder of the cases scat- 
tered over many sources. 


Method for Matching Groups 


In each comparison, the two groups were “matched” for “basic 
mathematical ability” by means of a regression technique which in- 
volved the following, consecutive steps: 


1. The regression coefficient of College Entrance Examination 
Board test scores (y) on ninth-grade mathematics marks (x) was 
computed independently for each group. 

2. The reliability of the difference between the regression co- 
efficients was then determined. If this difference was not significant, 
it could be assumed that the regression lines were parallel and the 
next steps could legitimately be taken . 


3. A weighted, combined value of the regression coefficient was 
calculated in order to get the “best estimate” of the slope of the par- 
allel regression lines. 

4, Using the “best estimate,” a line with this slope was placed 
through the centroid of each sample. 

5. The difference and the reliability of the difference between 
the y intercepts of these two lines were computed. 


It will be seen that the method first tests the separate regres- 
sion lines for parallelism and then takes out by regression the effects 
on test scores of differences in ninth-grade mathematics marks. The 
y intercept difference obtained may be looked upon as a College En- 
trance Examination Board score difference, “corrected” for inequal- 
ities in “basic mathematical ability,” i.e., a College Entrance Exami- 











Ter- 
ible 
ing, 
lent 


art 
oup 
ool. 
am- 
sti- 
) to 
rith 
on. 
at- 


‘as 
ar- 


ed 


en 








DOUGLAS G. SCHULTZ 


CEEB Math, 
Test Score (y) 


GROUP | 
v 


. ‘ 
. . 


\. 


; — 2 


2 


Ninth-Grade 
Math, Mark (x) 








FIGURE 1 
The Use of the y Intercept Difference between Parallel Regression Lines as 
the CEEB Test Score Differéhce between Similar-Ability Groups. 


nation Board score difference between two groups who are of similar 
ability as measured by ninth-grade marks. Figure 1 illustrates this 
graphically. The vertical distance, d , in any column between the two 
parallel regression lines is the same as the difference between the y 
intercepts, a, and a,, of the parallel lines. Therefore, the y inter- 
cept difference may be taken as the test score difference between 
similar-ability groups. 

The standard errors of estimate for the two samples were not 
investigated since it was felt that in this situation there was no rea- 
son to demand that they be equivalent. Most of the groups selected 
were known to differ from each other in certain respects so that it 
seemed most reasonable to conceive of each group as a sample from 
a population having its characteristics, rather than a sample from 
the same homogeneous population as the group with which it was 
being compared. Of course, it was still necessary for the regression 














374 PSYCHOMETRIKA 


lines in the group pairs to be parallel; otherwise the test score “cor- 
rection” would vary with the particular ninth-grade mathematics 
mark level taken. Permitting the standard errors of estimate to be 
different in the groups is a modification of the technique proposed 
by Gulliksen and Wilks,* which is basically similar to the usual 
analysis of covariance methods. Their first step is a test of whether 
the standard errors of estimate could be expected to arise in ran- 
dom samples from the same population. The procedures they give 
for testing for parallel regression and for the significance of the in- 
tercept difference assume, and so depend upon finding, equality at 
this first step. The inclusion of the test for parallelism in the pres- 
ent method is essentially a slight extension of the procedure de- 
scribed by Peters.t+ 

For the reliability of the difference between the regression co- 
efficients, the usual critical ratio was computed. For the combined 
“best estimate” of the regression coefficient, a weighted mean of 
the two independent coefficients was used, each one being weighted 
in the combination by the inverse of the square of its standard 
error. In obtaining the critical ratio of the difference between y in- 
tercepts of the two parallel lines, the standard error of the differ- 
ence was found by taking the square root of the following expres- 
sion: 


s*y (l— 71) 8*y, (1 — 1°2) 


2 cae \ 
0 (a.-a.) — = 
Se Ny Ng 





(1) 


(%, — &2)? 





+ 4 2 
1,87» N28? 2 


1 





ae 
8°, (1—71,) 8%, (1 — 12) 
where 


a =the y intercept of the parallel line giving the “best esti- 
mate” of the regression of College Entrance Examination 
Board test scores (y) on ninth-grade mathematics marks 
(x), 

8, =the standard deviation of the College Entrance Exami- 
nation Board test scores, 


*Gulliksen, Harold, and Wilks, S. S. Regression tests for several samples. 
Psychometrika, 1950, 15, 91-114. 


+Peters, C.'€. A method of matchin= groups for experiment with no loss 
of population: J,'educ. Res,; 1941, 34, 606-612. 














or- 
ics 


ed 
lal 








DOUGLAS G. SCHULTZ 375 


s, =the standard deviation of the ninth-grade mathematics 
marks, 

y =the correlation between College Entrance Examination 
Board test scores and ninth-grade mathematics marks, 

% =the mean of the ninth-grade mathematics marks, 

n =the number of students, and 


subscripts 1, 2 = the first and second groups, respectively, in each 
comparison. 

The following development indicates briefly the method by which 
this formula for the standard error of the y intercept difference was 
obtained.* Let us use the symbols as they are listed above, adding 


b =the combined “best estimate” of the regression coefficient as 
defined above, 

B= the “true” value of b, and 

a = the “true” value of a. 


Then, the difference between the y intercepts in the two sam- 
ples may be expressed as a deviation from its population or “true” 
value by 


d= (Q, — a2) wae (a, — ae). (2) 
Since a, = 9, — b#, and a. = #2 — b*, , we can write 
d= (9, — b&, — a1) — (Y2 — bX, — a2) (3) 


or 
d= (i — @, ~~ $8.3 To (Yo — Oe — Bx) a (b om. (Z, — #). (4) 
Squaring both sides of (4), we get 


d= (9: — a, — B#,)? + (Yo—O2 — Bt)? + (b — B)?(%, — %2)* 
—2(%, — a, — pe) (Yo — a2 — Bie) 
— 2(#: — ds) (b— 8) (1 — a1 — Ba) 
- 2(%, — X2) (b— B) (Yo — a2. — Bio). (5) 


The mean value of d?, taken with respect to pairs of samples, is 
the sampling variance of the intercept difference. Considering the 
z’s as constants, it can be shown that when the mean of d? is ob- 
tained, the last three terms of (5) become zero. This leaves 

*Presented here are’ the essentials of the derivation of the formula which 
was furnished to the author by Professor S. S. Wilks of Princeton University. 


The author wishes to express his appreciation to Professor Wilks both’ for’ de~° : 
riving the formula and for advice in the use of:the method -described. Ti ee: 








PSYCHOMETRIKA 


"(.—a.) ‘G—u—f2.)  Ge—e— fe) 


2, = 


od 


+ (%,— #2)? o (6) 
— a . 
ye eee 

The first two terms in this expression are the squared standard 

errors of the sample means, where the y’s are taken as deviations 

from the “true” regression line for that population. Thus, the first 
term can be estimated by 

8’y 1 (1 ‘ite 73) 
o = - 
(91 — a, — Bx.) nN, 





? 


and the second term can be estimated by a similar expression. 
Also, it can be shown that an estimate of o,»-s) is given by 


a 
N, 872, Nz 8*z 
a 
sy (1-171) 8*y, (1 — 12) 





o (b-B) —_— 
2 





Substituting these estimates in (6), we have 


sy (1—71?:) 8%, (1 — 72) 


07 (a -a,) — = i 
a2 nN, Ne 





(Z:— £2)? 
148°2, N28". 





2 





4 
s*, (1 — 71) 8’, (1 — 72) 


which is the same as (1) above. If large samples are used, the dis- 
tribution of (a, — a.) will be approximately normal. 


Results 

The results of the study are presented in Tables 1, 2, and 3. The 
data are presented separately for public and independent schools. 
Only groups of more than 100 students are included. 

The first three columns in each table identify the groups, each 
pair of which provides a comparison of two groups different in the 
characteristic under consideration but alike in the other two char- 
acteristics. The middle seven columns list data concerning the sepa- 
rate groups. 





SUBOUL GOUSIOBIP SCAP[8Og “SYIVUL SO]}BUISOYIBUI SPBIS-YIUTE UO sSeIODS 4s0} SOBUIOYJUI JO UOTsSeIZAI OY} JO SISYq OY} UO [BTIUALIO; f seul 
“SYFSU dJjs¥q,, AUB JOZ poye1100 ‘sUBeW 489], SIIZBUIEYIVPY Pavog uoleU paren comeayeg 2321199 OM} ey} WeEMJeq sounteyip on Ga ea." 


: Lt 8st 16 L’6 V8 TTV 389 
60 Beat 0% SOI] BUD earsuewandenoy 
v's vs c0T vor OTs PLT 


[BoTyeMEYAe 
Lt ot % L6 6%) «TTP 





—4s2L epnzyiydy o1ysejoyog 
389, 

SolPVWUlYy}eW VAisusyerdui0y 
3891 

SOIPVULIYIV I 9} VIpowi19zuy 
3291 

SOIPBULIY}C IY 9} VIpoutsz9zuy 
(301; BUEyye Wy 

—4S2aL epnziydy o14yse[oyog 
38201 

SOI}BUIIY}V I 9} VIPIuttequy 
[BoIQVUWeYyye 

—4s9, epnqiydy o4seloyog 
489, 

SOI}PVULOY}V A 9} VIPIutezUy 
[vorzeUIOy VW 

—4saL epnyiydy o1ysepoyog 
3S9L 

SOIPVULIY}V VAIsusyoidui0yg 
[BolyeUIaY}eW 

—4sa J, epnyiydy o14s¥B[oyog 
3S9L 

SOIPVULIY}V IY, VAISUBYoTAUIOD 
$s, 

SOIBVUWIOY}V] 9} VIpsuttezuy 
sa 

SO}VULIYIV 9} VIPseui19jUy 
[BoIzeULSY ze 

—4sa, epnqiydy o14sBloyog 
389 

SOI}PVUINY}L I 9} VIPSulteyuy 
[BorIyeUIEYeW 

—4saZ, epnyiydy o1yse[oyog 
389,.L 

SOIPVULSYV 9}VIpseuts3zUy 
[BoTyeuUTsy} BW 

—4sa], opnziydy o1yse[oyog 


9% 93° 80T F'08 Ove 
9% 96° S0t 7°08 OVE 
vs ve vot os vLT 
ST oT OTT 9 6L 992 
OT or oOt 08 


s[ooyss Juepuedepuy 


oS 96° OTT O'8L 


eoontmcmrwrtlCUCOUC OUCOUCOUCOClCUO 
nononmnownoenoewwewgyeMgyewwgeet i@ 


0% 63° Vet 96L 





L'v j GL 8°88 
38 198 TLT 
gL 8°88 69S 
&'8 8°98 69S 
$8 8°98 69S 
68 a8 T'98 TLT 


DOUGLAS G. SCHULTZ 


66 88 398 T83 


sfooyps e1qNd 


6t+ : 
&L o v8 SSE 


6 36 9°98 966 
Lo+ 


0 
0 
0 
0 
0 
0 
T 
i 
0 
0 


ao 68 Co Oo tC ae aS ae we S&S 


9L 9°6 0'r8 693 





«Bid 8109S o o Ww sjuop |esinop I 

pus “14S “Ue uUszBL 
poze1109 ‘ ylvy B109g 489,], AAV ‘YW! jo qse'T 
"Bid e100g ° use} ‘wIeW AAAO OpBir-y16 a13q eoulg 4s0L 
9°L @AAO " 9g -wnN sive x 
*[e1t09 


(SP6I ‘[Tlady jo soyepipueD pavog uolj4vuIMIexg souBaAyUm 8ZTIOD) 
s3s0]J, JUOTZyIG Suryey, sdnory Aq opey se100g solrzeUeYyyeyL FO suosIavduiog 























378 PSYCHOMETRIKA 


The last three columns in the tables are of most interest, as they 
include the comparative statistics for each pairing. The first of these, 
labelled “CR (Regress. Coef. Diff.),” gives the critical ratio of the 
difference between the regression coefficients independently computed 
for each member of the pair (and presented in the column immedi- 
ately to the left). The last two columns show the test score differ- 
ence between the paired groups, corrected for any inequality in “‘bas- 
ic mathematical ability” as described above, and the reliability of 
this difference. 

Attention should be called to the values of the correlation co- 
efficients found between ninth-grade mathematics marks and College 
Entrance Examination Board test scores. It will be seen that they 
range from .12 to .51, averaging .38 for the public schools and .23 
for the independent schools. These figures are not so high as would 
have been desirable for purposes of this study. They are probably 
lower than would have been the case if it had been possible to make 
adjustments for school-to-school differences in grading systems and 
standards. A better index of basic mathematical ability would likely 
raise these coefficients and introduce a more accurate correction. 

All of the comparisons in Table 1 concern groups of candidates 
who were alike in amount and recency of training but who, for one 
reason or another, chose to take different tests. Of course, only can- 
didates with two and one-half to four years of training had any real 
choice to make, since an individual with less mathematical back- 
ground than that could only have taken Scholastic Aptitude Test — 
Mathematical. 

None of the differences between paired regression coefficients in 
Table 1 can be said to be statistically significant. The highest criti- 
cal ratio is 2.0. It can, therefore, be assumed that the regression 
lines in the paired groups are parallel, and that one is justified in 
proceeding to examine the “corrected” test score differences. 

The “College Entrance Examination Board Test Score Diff. Cor- 
rected” column indicates that for both public and independent schools 
a substantial difference in favor of the Intermediate Mathematics 
Test wherever it was taken by one of the paired groups. The critical 
ratios are all above 2.0, except for one of 1.7, so that none of the dif- 
ferences can be disregarded as merely chance occurrences. Using as 
a rough figure the average of the eight corrected differences involv- 
ing Intermediate Mathematics Test, we can say that students who 
took Intermediate Mathematics Test made scores that were, in gen- 
eral, about 25 points higher than those of comparable students 











*7X0} 908 { UOT}NVD YY A[UO pezde00" aq pjnoys senywa osey,y} 
*Surursa} JO syunoury Joyweis yyIM syuepnys Aq 
OPVUL BIBM SBLOIS 4S9} JSYSIY SUBEUI BOUdIOYIP SAI}IsOd ‘ojqu} SIy} UT ‘“peyNduod SBM sINSYy siy} MOY JO UOl}BUBl[dxXe IOJ [ V[GB], 2S» 





| 
| 





om 9% oO SOT)=SOF0S)SC ECS LWI 
rar 92¢ OTL OSL F8 LWI 





vs ve 82g vot ots vLT W-LVS 
0S 63° 4 S87 Vor 96L LOT W-LVS 
0°S 6° TLg 68 8°98 69g LWI 
Si ve : 6 9°98 966 LII 
9°g cs 66S 38 T'98 TLT W-LVS 
8's se 6LV 9°6 08 606 W-LVS 


s[ooyag 
qyuepusedepuy 


onqnd 


4T°9 497+ 





(YaqB]L 2.1009 W o W syuep 
wo *« BIC (HIG uo pues “nyg Psinoh 
*Jo09 81009) YART 21009 4saz Ye “UIC jo ‘Ue uUdsyBL 
peqoa1109 *ssoid ‘JOON useM} “UW AAAO epe1y-416 13q 4se'_ 
"YIq 2100S -37y) ‘sSoisd -og -uInN, | 9ouTg 4soL 
S°L AHO wo “9 “Je110H SIBO A 


N 
- 
eal 
3) 
i) 
1S) 
n 
o 
wn 
< 
=| 
o) 
=) 
° 
i=) 























(SF6T ‘[tIady JO sozyepipueD pxzog uoleulWIeXY soURIyUy aSaT[0OD) 
SUIUIVIT, JO SJUNOWY yustEIq 
yyIM sdnorry Aq apeyT sazoog sorewmayyey, JO suostredwog 
6 ATaViL 








> sae @-1e. 3% 85 2 Oe 1  @o &e A o™~ BN 4D et HR — -_ -» en - a 








380 PSYCHOMETRIKA 


who took Scholastic Aptitude Test-Mathematics or Comprehensive 
Mathematics Test.* On the other hand, it did not seem to matter 
whether students with four years of training elected Schclastic Apti- 
tude Test-Mathematics or Comprehensive Mathematics Test. The 
corrected differences between these groups were found to be 0 for 
the public school sample and 7 for the independent school sample. 
The critical ratios are 0 and 0.9 respectively. The superiority in 
ninth-grade mathematics of the students with four years of prepara- 
tion who chose Comprehensive Mathematics Test apparently ac- 
counted for all the original difference in the test scores, on the av- 
erage. 

Table 2 is organized around the problem of the effect of added 
training in mathematics. Three of the regression coefficient differ- 
ences appear not to be significant, but the fourth has a critical ratio 
of 3.3. The latter value should probably be minimized as indicating 
a real difference in view of the fact that, of all the CR’s of regres- 
sion coefficient differences in the three tables, it is the only one great- 
er than 2.0. However, the corrected score difference for this particu- 
lar comparison should be interpreted with extreme caution, since the 
condition of parallelism is not met; the last two columns are filled in 
this case only for the sake of completeness. 

The fourth year of mathematics would seem to result in a dis- 
tinct increase — of the magnitude of 30 or 40 points — in test score 
on either Scholastic Aptitude Test-Mathematics or Intermediate 
Mathematics Test. Of the three College Entrance Examination Board 
score differences corrected which can be accepted in Table 2, two are 
large and very significant; the third is moderate, with a marginal CR 
of 1.8. All are in the same direction: as might be expected, higher 
test scores are associated with additional work in mathematics. Even 
the difference whose validity is doubtful falls in with this consist- 
ency of direction. Although the difference is only moderate for In- 
termediate Mathematics Test in the independent school sample, it is 
of considerable size for the same test in the public school comparison. 

Of course, these statistics give no basis for determining whether 
the increases found are due to the learning of specific material or to 
a more general advancement in mathematical maturity. 

It is interesting to notice that only the three-to-four year train- 
ing differential is included, since there were insufficient numbers of 

*Independent investigation of item-analysis data by Dr. L. R Tucker of the 
Educational Testine Service led to the same conclusion. Therefore, beginning 


with the series of January, 1949, Intermediate Mathematics Test scores were 
adjusted for the discrepancy described here. 








re 
co 
oD 


DOUGLAS G. SCHULTZ 


94} YA 821008 489} Ul SSO] SoIVOIPU! SOUSdIOYIP SATWBZeu ‘e/qQ¥z SIy} UT 


‘aul]} JO osussud 





*‘poynduioo SBA SINSyY sSTy} MOY JO UO;JBULIAxe JOF T [GBT Se 














F J sv 9T° 6 867 OTT 96L 992 & LWI + = 
sh ail cT | gs 069s) 6s Gs—‘C‘<iTTS:CiC(tié’CS|COS DWI 0 gf 
© 9 
o'r ge 91 oT ra i 6L 6LV fot 08 LT& & W-LVS 1: 4 5. 
02° 6a v8 S8P Vor 9 6L L6T te W-LVS 0 g 
rar 9I— r'0 8 Le 26 667 3°8 3°98 T83 $ LAI T 
ce vs S6 PTs 26 9°98 966 g LWI 0 Ss m4 
ge 
rl ‘a z0 Lo os &L TLV L’8 248 &S&é € W-LVS ic 8. = 
P 8° s¢ 9L 6LV 9°6 078 64% § W-LVS 0 
ao «Na Cela | (eM emg o» WO 2 W swop | sur @sinog 
*J0D uo = pu “nig j-urer, uexV@y ‘we 
po}de1109 *sS013 | 91009) yIB 81009 4sey, yey ‘WW jo | -yye 4se'] 
‘BIC er00g -24) | "Jeo useM, “WN ATATD eperyp-yg 410q | zo qseJ, -90UIg 
9°,L AAO uo ‘ssoid  -0g -unN |sreox sivox 





-3Y ‘[e1I0g 











(876T ‘[lady jo soyepipuey parvog uoneulexq soueIjUy edeaT[0D) 


SUL], JUSTAYIC 38 WAN0D soryeuUsyEW yseT 
TIOYL Peysturq eavy OYM Sdnory Aq ope se109g soeUIEYyePY JO suOSTIedWOD 
€ AT&ViL 











382 PSYCHOMETRIKA 


two-year students. Apparently the mathematical section of the 
Scholastic Aptitude Test was being taken largely by students who 
had had more than the minimum amount of preparation. No evi- 
dence is available concerning the effect of additional training on 
Comprehensive Mathematics Test. It would be abnormal for a candi- 
date to have taken that test with less than the full amount of aca- 
demic preparation in secondary school. 

Table 3 compares groups of candidates studying mathematics at 
the time they took the test with groups that had discontinued their 
study a year before. In each pair, of course, the groups were alike 
in amount of training and in test taken. As most College Board can- 
didates take the tests within one year after finishing their last course 
in mathematics, it was impossible to find satisfactory groups with 
two or more years’ lapse between last course and test. 

None of the regression coefficient differences resulted in a sig- 
nificant critical ratio. The regression lines in each comparison may, 
therefore, be said to be parallel. 

The conclusions about the amount of difference seem to be rather 
clear and consistent for both types of schools. The average loss after 
one year’s inactivity was slight on Scholastic Aptitude Test—Mathe- 
matics, being 7 or 8 points. In neither case is the value statistically 
significant. Intermediate Mathematics Test scores show somewhat 
more loss, 16 points for the public schools and 31 points for the in- 
dependent schools. The first of these figures is barely significant and 
the second is distinctly significant. The contrast of the Intermediate 
Mathematics Test results with those of Scholastic Aptitude Test- 
Mathematics is not great. The fact that all the differences are in the 
same direction tends to increase one’s confidence in the significance 
of each one. No evidence is available for Comprehensive Mathematics 
Test, since almost all candidates who took this test did so as they 
were finishing the fourth year of mathematics. 

The effect of recency would appear to be considerably less than 
one would anticipate. There may be a number of explanations for 
this. Certainly many of the applicants who have not had mathematics 
immediately prior to the testing do some informal reviewing or 
“brushing up” which would not reveal itself as formal study but 
which would bring about definite improvement in performance. The 
tendency to a review of this kind would perhaps be greater among 
candidates planning to take the Intermediate Mathematics Test, be- 
cause it was designed for a higher level of attainment. If achieve- 
ment material is forgotten more readily than is aptitude material, 
more intensive review by Intermediate Mathematics Test candidates 




















DOUGLAS G. SCHULTZ 383 


would tend to conceal the real contrast between Scholastic Aptitude 
Test-Mathematics and Intermediate Mathematics Test. This prob- 
lem might well be subjected to experimental investigation. 


A Word of Caution 

It would seem desirable to add a word of caution in order to 
avoid any misunderstanding. The point should be stressed that the 
study dealt only with group means; no intercorrelations of tests could 
be determined from these data. Therefore, none of the evidence gives 
any indication whatsoever with respect to the behavior of any one 
individual. For example, while it did not seem to matter on the aver- 
age Whether students with four years of mathematics elected Scho- 
lastic Aptitude Test-Mathematics or Comprehensive Mathematics 
Test, some individuals might have shown relatively greater aptitude 
than achievement, while others might have shown relatively greater 
achievement than aptitude. The former might have had higher scores 
on Scholastic Aptitude Test-Mathematics; the latter, on Compre- 
hensive Mathematics Test. The findings of the study would not be 
inconsistent with such a situation. 


Summary 

The purpose of this study was to obtain empirical evidence re- 
garding the effect on scores from three mathematics tests of (1) 
choice of test, (2) amount of training in mathematics, and (38) re- 
cency of training in mathematics. The findings were based on a rep- 
resentative sample of candidates taking the April, 1948, series of 
College Entrance Examination Board tests. Groups were paired in 
a number of comparisons so that two of the factors under study were 
held constant while the third was varied. The paired groups were 
matched for “basic mathematical ability” by means of a regression 
technique which first tested the separate regressions for parallelism 
and then “corrected” the observed mean difference in test scores for 
any difference in ninth-grade mathematics marks. 

Choice of Test. The results indicated that, on the average, stu- 
dents who had equal ability and who had just completed four years 
of mathematics courses made comparable scores on the mathematical 
section of the Scholastic Aptitude Test and on the Comprehensive 
Mathematics Test. It was found, however, that candidates who took 
the Intermediate Mathematics Test made scores that averaged sub- 
stantially higher than those of students with similar training who 
took either of the other two tests. (This difference was eliminated 
from scores reported for the series of January, 1949, and thereafter.) 














384 PSYCHOMETRIKA 


Amount of Training. Students with four years of secondary 
school mathematics made definitely higher scores, in general, on 
Scholastic Aptitude Test-Mathematics and Intermediate Mathematics 
Test than did those who had only three years of work. It was not 
clear from the study whether Scholastic Aptitude Test-Mathematics 
or Intermediate Mathematics Test was more affected by advanced 
work. 

Recency of Training. As compared with candidates who had 
completed their last mathematics course recently, those who had fin- 
ished a year prior to taking the test made somewhat inferior scores 
on both Scholastic Aptitude Test-Mathematics and Intermediate 
Mathematics Test. The average loss on Intermediate Mathematics 
Test scores after a period of inactivity was slightly greater than for 
Scholastic Aptitude Test-Mathematics, and the contrast between 
the two tests might be clearer in reality if informal review could be 
taken into account. 


Manuscript received 4/6/50 











tars 


— 7 


_ rnd “2 we ‘ey Ve ' se 











PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


ON THE EFFECT OF THE CUTTING SCORE WHEN 
SELECTION IS PERFORMED AGAINST A 
DICHOTOMIZED CRITERION* 


Z. W. BIRNBAUM 
UNIVERSITY OF WASHINGTON 


1. Introduction. 


In a number of practical situations a selection from a bivariate 
normal population is performed by retaining only the individuals 
for whom the value of one variate is at least equal to a given cutting 
score, while the other variate is dichotomized and is observed only 
in the selected population. This is, for example, the case in instances 
in which an admission test is used to select candidates for a course 
of training and, after this training is completed, the trainees are 
judged successful or unsuccessful by means of an examination of 
their achievement. In the following we will use the terminology of 
this example, referring to the one variate as “admission test score” 
and to the other as “achievement score,” although the discussion re- 
mains valid for the general case of any bivariate normal population. 
The following problem is often of interest: 

The admission test results in a numerical score X, and only 
those candidates are admitted to the course of training whose X is at 
least equal to a “cutting score” €. The achievement after completion 
of training is not expressed in a numerical score, but is classified in a 
dichotomy as “successful” or “unsuccessful”. A certain fraction R 
of those admitted is thus qualified as successful, and it may be ex- 
pected that this fraction will depend on the cutting score: R = R(é). 
Given sufficient data obtained with a cutting score é,, is it possible 
to tell how a change of the cutting score will affect this ratio, i.e., 
can one compute R(&é) for other values of €? In the following, a 
simple method is described which, under fairly general assumptions, 
leads to a numerical solution of this problem. 


*This paper was prepared under the sponsorship of the Office of Naval 
Research. 


385 














PSYCHOMETRIKA 


2. Assumptions. 


Let X be the admission test score, and Y a numerical achieve- 
ment score. While X is actually recorded for each applicant, we shall 
assume that the dichotomous classification of the trainees is based 
on Y, i.e., that Y is implicitly observed (although not recorded) and 
a trainee is consistently declared successful if and only if Y is at 
least equal to some cutting score 7. We shall, furthermore, assume 
that (X,Y) has a bivariate normal distribution 


1 (X-a)?2/oz°-2 — a) (¥—b) /OzOy+(Y-b)?/ay? 


f (X,Y) ——- eo ais scenery a ee eee 


220, 0, V1— p” 





and that, for = &,, records are available which contain 


(i) for each applicant, whether admitted or not, his admission 
test score X , 


(ii) for each applicant admitted to training, his X and his fi- 
nal classification as successful or unsuccesful. 


The number of available records will be assumed large, so that the 
means and variances computed from these records can be considered 
practically equal to those of the universe. 

Under these assumptions the fraction, among those admitted, 
of those who successfully complete their training is 


— 


R(é) = hi fron ax ay / ai me aX, (1) 


(2 
y= X-£ V 22 oe 


From the records (i) we can compute the mean a and the variance 
o?, of X. We will, therefore, assume that a, o, and &, are known, 
while b, o,, 7 and p remain unknown. Our problem now becomes: 
given R(é,) and any other value of €, compute R(&). 


3. Numerical solution of the problem. 


Introducing standardized variables, we rewrite (1) 


R(é) = 


(2) 


n av ; . 1 a 
|  iiaisiks dy dx | — e  dx=R'(h) 


v2 2a exh 




















Z. W. BIRNBAUM 


where 








E—a y—b 
= k= 


? 
oz Oy 


and hence h can be computed for any real €, while & and p are not 
known. Writing 





R'(h) -— iz * dx=S(h), (3) 
V2a r=h 
we have 
(x? a +y?) 
Ss) =—-——— f ¥ ts a-2p) dy dx = P(h,kip) , (4) 
2av/1 — p? a=h y=k 


where P(h,k 3p) is the bi-normal probability tabulated in Tables VIII 
and IX in Pearson’s Tables for Statisticians and Biometricians, Part 
II.* Since the records (ii) are available for X > &,, that is for 
ee = 
Gz 
choose a number &, > &,, select from the records (ii) only those for 


= = We thus ob- 


Or 


. , we can compute R(h,) and hence S(h,). We then 








which X > &,, and compute S(h.) , where h. = 
tain two equations: 
S(hi) = P(h,,k3p), and 
S(h2) = P(h2,k 5p), 


in which h, , S(hi), h2 , S(h2) are known, while k and p are unknown. 
If we succeed in solving (5) for k and p, and substitute the solutions 
in (4), we will be able to compute S(h) for any h, and hence R(&) 
for any €, by using Pearson’s tables. 

To solve (5) numerically, we look in Pearson’s tables for a value 
p such that P(h,,k;p) = S(h,) and P(h.,k3p) = S(h.) for the same 
value of k. In most cases some interpolation will be necessary. A 
considerable number of computations have shown that linear inter- 
polation, as indicated in the example of the next section, is sufficient- 
ly accurate. 


(5) 


*Pearson, Karl. Tables for statisticians and biometricians, Part II.. Cam- 
bridge, England: The University Press, 1931. 














PSYCHOMETRIKA 


4. An example. 

The admission test was standardized to have a mean 50 and 
standard deviation 10. The cutting score for admission was 60, and 
among the 8950 admitted 2943 completed the training successfully. 
Among the 7290 who scored at least 65 in the admission test, 2754 
were successful. What proportion of successful trainees may be ex- 
pected if the cutting score is lowered to 55? 


To answer this question, we compute 








60 — 50 2943 
_ agi —— AeV ys ‘ h, — ’ 1 co, 
h 10 1.0, R’(h,) = 3950 32883, S(hi) =.05217 
65 — 50 2754 
h,= =1.5, FR’ =—— = .37778, S = .02524, 
2 10 5 (he) 7990" 7 (he) 


and obtain the equations (5) 
P(1.0,k 3p) = .05217 
P(1.5,k 3p) = .02524. 


To solve this pair of equations for k and p, we find from Pearson’s 
tables 


(4.1) 


P (1.0, .9; .85) = .055460 , P (1.0, 1.0; .35) = .049414, 
and by inverse linear interpolation, P(1.0, .9544; .85) = .05217, 
P(1.5, .9; .85) = .027206 , P(1.5, 1.0; .35) = .024506 , 
and by inverse linear interpolation, P(1.5, .9729; .35) = .02524, 
P(1.0, .8; .80) = .057259 , P(1.0, .9; .80) = .051217, 
and by inverse linear interpolation, P(1.0, .8842; .30) = .05217, 
P(1.5, .8; .80) = .027387 , P(1.5, .9; .30) = .024722 , 
and by inverse linear interpolation, P(1.5, .8806;.30) = .02524. 


Denoting by ki(p) and k2(p) functions of p such that 
P(1.0, k,(p) 3p) =.05217 and P(1.5, k.(p) ; p) =.02524, 


and assuming that in the interval .30 < p < .35 each of these func- 
tions is approximately linear 


k.(p) =ap +b, k.(p) =ep +d, (4.2) 
we have 


ky, (.385) = (.35)a + b= .9544; hk. (.85) = (.85)¢ + d= .9729; 
k, (.80) = (.30)a + b=.8842; k,(.380) = (.80)¢c + d=.8806. 

















tb he 2” be fu 








Z. W. BIRNBAUM 389 





Solving for a, b, and for c, d, we obtain a = 1.4040, b = .4630, 
c = 1.8460, d = .8268, and from (4.2) the approximate linear ex- 
pressions 


ki (p) = (1.4040) p + .4630, h.(p) = (1.8460) p + .3268. 
A solution of (4.1) is obtained by solving k,(p) = k2.(p), which yields 
p—.3081, k=.8956. 


For cutting score 55, we have h = .50 and, using interpolation for- 
mula (a) of p. XVIII of Pearson’s Tables, obtain 
P(.50, 8956; .8081) = .088315 . 
Dividing this by 
oe) t? 
1 ee 
| e dt = .308537, 
V 2a -50 








we finally obtain 
R'(.50) = .2862. 


Manuscript received 12/30/49 
Revised manuscript received 3/5/50 





















PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


MAXIMIZING PREDICTIVE EFFICIENCY FOR 
A FIXED TOTAL TESTING TIME* 


CALVIN W. TAYLOR 
UNIVERSITY OF UTAH 


For any fixed total time of testing it is possible, through proper 
item-and-time allotment, to combine tests into a battery so that the 
multiple correlation with a pre-assigned criterion will be maximized. 
By holding constant the ratio of the length in number of items to 
the time length for each test, a set of general equations has been 
derived which will yield this maximum value of the multiple R and 
will enable one to determine, in any given case, the optimal fraction 
of total testing time that should be devoted to each type of test 
under consideration. The set of general equations is applied to a 
two-test-battery problem to obtain the optimal length of each type 
of test for one hour total testing time. If two other tests had been 
selected for the two-test sample problem, different subdivisions of 
the total time would generally occur. The manner in which the re- 
sults would change when using other tests with different initial 
reliability, validity, and intercorrelation values is briefly presented. 
Some general implications of this method of battery development 
are also discussed. 


I. The Problem 


The present paper deals with the identical problem covered in 
Horst’s recent article.t Though Horst’s development is carried to a 
later stage, this presentation supplements his article by offering an 
independent and different mathematical and verbal treatment of the 
same basic problem. 

A factor frequently encountered in testing programs is that 
some restriction is placed on the total amount of time available for 
testing. If it is planned to use two or more tests in the battery, then 
the problem of how to distribute the total time to the separate parts 
of the battery immediately presents itself. In the past, this problem 
has nearly always been disregarded; even in procedures outlining 
how to construct a special set of tests for a testing program, there 
has been no clear-cut method for deciding how much time each type 
of test should be allowed. 

*The writer is indebted to Max Woodbury for his assistance and especially 
to Dr. N. J. F. Van Steenberg and Dr. Anna S. Henriques, who provided valu- 
able guidance and aid in the development of the solution to this problem. This 


paper is a revision of a thesis submitted in 1939 at the University of Utah in 
partial fulfillment of the requirements for the master’s degree. 

_. tHorst, Paul. Determination of optimal test length to maximize the mul- 
tiple correlation. Psychometrika, 1949, 14, 79-88. 


391 











392 PSYCHOMETRIKA 

In the present treatment, the ratio of length in number of items 
to time length is considered to be constant for each particular type 
of test. Thus, when a given test is changed in length, both the length 
in number of items and the time length are altered in the same pro- 
portion. Even with this imposed restriction it is apparent that a tre- 
mendous number of test batteries could be formed by varying the 
distribution of time and the corresponding length in items among the 
different types of tests. It is also obvious that the criterion will be 
predicted with different degrees of efficiency according to how the 
total testing time is subdivided among the available types of pre- 
dictors. 

In the final section a set of general equations has been derived 
which will permit one to determine the best breakdown of the total 
time in order to attain a maximum value of the multiple correlation 
coefficient. By this means, a battery of optimal length tests can be 
designed to yield the best prediction of the criterion within the avail- 
able testing period. 


II. Application 


In this section the derived set of general equations will be ap- 
plied to a hypothetical problem involving a two-test battery. 

Let t be the total testing time proper that is available, and let 
the two tests be numbered 1 and 2 and the criterion be numbered 0. 
Let 


n, = the fraction of the total time ¢ allotted to test 1 type of 
items, 


and n, = the fraction of the total time ¢ allotted to test 2 type of 
items. 
The total testing time then has the following breakdown: 
mt + nmt=—t, (1) 


which simplifies to the following equation, which is a special case 
of the general conditional equation (9a) presented later: 


m+n—1=—0. (2) 
To start the problem, each test is considered to be of t item-and- 


time length and hereafter the correlations that are not primed are 
for tests of this ¢ length.* 


*If the original tests are not of this length, the correlations needed for the 
same type of tests of t length may be estimated by utilizing equation (6). 




















CALVIN W. TAYLOR 393 


The general equations (38), derived later, which give the best 
time allotment, take the following form in the case of a two-test bat- 











tery: 
a. at 
An | B= dy, zi -, (3) 
Nz Ne Ny nN, 
where 
Mm =1+ (n1—1)ru, (4a) 
ic 1 + (nm — 1) roe ’ (4b) 
| ? 10 T' 12 
Ao. =} = (10 —7'o2 112), and (5a) 
| Yoo 1 
T'10 eo Sea atl 
Au = P ; ae (1'o1 rc 2 Nee). (5b) 
T 20 VY o4 








The correlation coefficients which are primed are those estimated 
for the two tests when they are of n,¢ and nt length, respectively. 
The formula, upon which the derivation in the last section is built, is 


; Nj Nj 
ru=ry | rae a (6) 
i Nj 


By substituting the above values plus the value of n. from equa- 
tion (2) into equation (3) and solving for 7, , 





1 ar N11 
= (To. — Tee T12) + To2(1 = Pied 
V i Teo 
eine eee. : (7) 
1—r 
+ ri ‘2 (To1 T22 — To2 N12) + (101 N12 — To2 T11) 
7 ee 








In the sample problem let the total testing time proper (t) be 
one hour and let each of the tests be one hour in length. The fol- 
lowing statistics are given for tests 1 and 2, each of ¢ length. 


To, = .40 == 20 a 
Tos .50 To. = 80 T— .20 


After substituting these values in equation (7), simplifying, and 
selecting the appropriate sign, n, = .31. 

Thus, the optimal length for the first type of test would be 31% 
as long in number of items and in time limit as the t length test 1. 





















394 PSYCHOMETRIKA 









































ne ——— =00 
| 
1 
| 
| — 
5 So 127.20 
5 ; , Lie NX 
1 | 
ie 
| ‘ei ryo=.40 
50 [73-50 ai. “a 
1 | 
R 4 
0:12 | : 
ee 
45 —— 
| 
| , | 
_—s 
| | | fo, 7-40 
.40 pons 
| | | 
1 | 
an 
! 
t | 
35 a 
| | | 
peal | 
{ | | 
{ | | 
_30 | L| 
.@) 2 4 6 8 10 


n 


FIGURE 1 


The time allotted to this shortened test 1 would be 31% of an hour, 
or 18.6 minutes. 

Also, n. = 1 — n, = .69 , so the optimal length for the second type 
of test would be 69% as long as the t length test 2. The time allotted 
to this shortened test 2 would be 41.4 minutes. 

In solving a two-test battery problem, the appropriate statistics 
for tests of t length could be substituted into equation (7) for a di- 
rect solution. But for larger battery problems, after substituting 
the appropriate statistical values into the general equations (9a) 
and (38) found in the last section, it is necessary to complete a se- 

















CALVIN W. TAYLOR 395 


quence of intermediate steps corresponding to those between equa- 
tion (6) and equation (7) to solve for the unknowns. 


III. Discussion 


In test theory there is a principle that seems, upon first expo- 
sure, to be somewhat paradoxical—namely, that if two unit-length 
tests with the same validity but with different reliability values are 
lengthened an equal amount, a greater gain in validity will result 
from lengthening the less reliable test. In the present problem the 
closely related principle also operating is that if two unit-length tests 
with the same validity but with different reliability values are short- 
ened an equal amount (with the resulting length still greater than 
zero), the greater loss in validity will result from shortening the less 
reliable test.* Therefore, in terms of validity shrinkage, the less reli- 
able test cannot withstand shortening as well as the more reliable 
test. This concept is important in understanding the present prob- 
lem in which two or more types of tests are in competition for the 
available testing time. 

The middle curve in Figure 1 shows the relationship in the sam- 
ple problem between the multiple correlation coefficient, Ry.., and 
the fraction, n, , of the total time devoted to test 1 type of items. The 
maximum value in the middle curve is reached above the baseline 
value of n, = .381, as was found in the algebraic solution. A close 
approximation, such as n, = 1/3, could be used since the R values 
of the curve approach the maximum value in the immediately adja- 
cent range. The location of the maximum to the left side of the 
curve is understandable when the above principle on shortened tests 
is applied. Since test 2 is more valid and less reliable than is test 1, 
for best combined prediction, test 2 should be shortened less than 
test 1 because more validity would, in proportion, be lost in shorten- 
ing test 2 than by shortening test 1. 

At the extreme left side of the curve where n, is zero, the total 
time is devoted to test 2, and at the extreme right side of the curve 
the entire hour is devoted to test 1. Thus the curve drops to the va- 
lidity of one initial test on one side and to the validity of the other 

*This principle can be verified by applying equation (12) to each of the two 
tests described in the statement. Only the denominator in the term on the right 


side of the equation would differ for the two tests and this denominator, for 
values of n; less than 1 and greater than 0, would be smaller for the more re- 


liable test. Therefore, the estimated validity coefficient derived from the more 
reliable test would be greater than the one for the less reliable test when the 
two tests are shortened an equal amount. Both principles can be verified graph- 
ically by plotting for each of the two tests a curve showing the relationship be- 
tween length and validity. These two curves will intersect at unit length. 

















396 PSYCHOMETRIKA 
initial test on the other side. 

There is a portion of the curve at the right in which the multiple 
R is less than the .50 value which can be obtained at the extreme left 
by devoting the total time to test 2. Thus, in the division of time 
between two tests, there can be cases when the combined predictive 
efficiency is less than it would be if all the time were devoted to the 
initially more valid test. This is particularly possible when one test 
is much more valid than the other and when the correlation between 
the two tests is fairly high (not considering possible suppressor ef- 
fects). 

The two other curves in Figure 1 were obtained by taking an 
intercorrelation of .00 in one case and .40 in the other case, with all 
other statistics remaining the same as in the sample problem. As ex- 
pected, of the three cases the zero intercorrelation case gives the 
highest curve with the highest maximum. An interesting trend oc- 
curs with respect to the location of each maximum, for as the inter- 
correlation becomes higher, the maximum drops down and shifts to 
the left in an accelerated fashion, being at the n; values of .34, .31, 
and .25, respectively, for the intercorrelations of .00, .20, and .40. 


Other curves were developed and the trends were noted when 
different test validities were assumed. In each case the reliabilities 
were considered unchanged and the intercorrelations were taken as 
zero. If the validity of test 1, as indicated at the right end of the 
graph, were a lower value of .30, the maximum would drop to an R 
of .55 and would shift considerably to the left (to », == .26) in com- 
parison with the top (zero correlation) curve in Figure 1. If, in an- 
other case, the validities of the two tests were both .40 with zero 
intercorrelation, the maximum would drop to an R of .53 and would 
shift toward the center. However, the maximum still would be some- 
what to the left of center at n, = .41 because of the lower reliability 
of test 2 and the consequent more rapid loss in validity from shorten- 
ing test 2 than test 1. 

Figure 1 can be obtained from a three-dimensional system with 
Roz as the vertical axis and with n, and n, as the horizontal axes, 
wherein tests 1 and 2 may vary in fraction-of-total-time length from 
zero to unity. The multiple correlation surface in this system will 
have a curved trace in the n,R and in the n.R coordinate planes, the 
curve in each plane being the one showing the relationship between 
the fraction-of-total-time length of the particular test and its va- 
lidity [see equation (12)]. The curve in Figure 1, then, shows the 
trace of the multiple correlation surface in the plane, n, + n.=1. 
Since in the present treatment n, and m, are not directly time 

















CALVIN W. TAYLOR 397 


variables but are fraction-of-total-time variables, Figure 1 is not ap- 
propriate as a basis for considering what will happen as the total 
available testing time (t) varies. However, it can readily be dem- 
onstrated that the best value of n, (and also the best value of m) 
will not be constant but will vary as the total testing time (¢) varies. 
In fact, Horst’s two sample problems illustrate this point. 


The present procedure permits an increased flexibility in test 
construction practice, since a final test built for a special test pro- 
gram might resemble other tests of the same type but differ from 
them in length. This more flexible approach should frequently re- 
sult in significantly better prediction than obtained by more rigid 
methods. The procedure calls for extending the idea of splitting a 
unit-length test into two parallel halves to the notion of dividing a 
test into several parallel fractional-length tests. In practice, there 
would be limitations on the number of equal-length, parallel subtests 
into which a given test could be divided, but once this smallest sub- 
division was determined, tests of any multiple length of this subtest 
could be built by combining parallel subtests and by constructing new 
parallel subtests for inclusion if the final test is to be longer than the 
initial one. Skill in construction of parallel forms would be neces- 
sary as would knowledge of test composition so that tests could be 
broken up into several parallel subtests as described. 


In battery developmental programs one might utilize this more 
flexible approach by having available a variety of different tests, each 
one being neither too long nor too short for trial purposes. It might 
be profitable to use these relatively short tests in experimental trials 
even though each one may not have reliability coefficients as large 
as has been customarily desired. Since testing time and facilities 
might be somewhat limited in experimental trials as well as in the 
final testing program and since the validity of the final battery should 
always be checked on another sample, there are at least four argu- 
ments for using a set of short experimental tests: First, if the “wash- 
out” rate of tests is high, as it often is, for example, in the develop- 
ment of aptitude test batteries, the failure of an experimental test to 
be valid would not be as noticeable as if the test had required a con- 
siderable portion of the experimental time. Second, in many cases a 
wider assortment of tests could be tried so that the chances of find- 
ing a number of different valid types would be greater than if longer 
standard tests were the only ones used in the experimental trials. 
Third, the short tests that proved to be valid could be lengthened, 
which would result in a clear-cut increase in their validities, par- 














398 PSYCHOMETRIKA 


ticularly if the reliability of the initial tests could be sizeably boosted. 
And fourth, the final battery which would utilize all the available 
testing time would most likely have adequate reliability as well as the 
important maximum validity. 

The present problem has dealt with two variables, time length 
and length in number of items, that could be independent variables 
in test construction procedures. The important conditional equation 
in the problem is that the total time available for the final battery 
of tests is fixed. By imposing the restriction that time length and 
length in number of items are proportional, one of these two vari- 
ables becomes a dependent variable completely determined by the 
other variable. Since this restriction of proportionality of these two 
variables is implied in the general formula (10) for the correlation 
between scores in two types of tests as the test lengths vary (and 
also in the Spearman-Brown formula and in other special cases of 
this general formula), it thus becomes possible to use this general 
formula as the starting base for the derivation. 

Before imposing the restriction that time length and length in 
items be proportional, one could, in a given prediction problem, in- 
vestigate the effect on validity of altering the time limit for a con- 
stant number of items of a certain type. This problem has been dis- 
cussed briefly by Guilford* who, in citing a study on time-limit tests 
by Lindquist and Cook, states that for a given amount of test ma- 
terial there is an optimal time defined as the shortest time at which 
a greater increase of validity of the obtained scores can be secured 
through the addition of more (homogeneous) material with a pro- 
portionate increase in time than by permitting the extra time to be 
spent on the same material. 

Theoretically, then, in a given problem one could first find for 
each type of test an approximately optimal ratio of time length and 
length in items; secondly, by assuming that this optimal ratio remains 
the best ratio as longer or shorter tests of the same type are con- 
structed and by holding this ratio constant for each separate type 
of test, one could find by means of the present solution the optimal 
length of each type of test for a fixed total testing time; and thirdly, 
optimal regression weights could be utilized, according to the cus- 
tomary procedure in multiple regression applications. In this man- 
ner one could add the optimal ratio of time length and length in items 
to the double optimum called for in the present derivation, namely, 
optimal test lengths and optimal regression weights. 


*Guilford, J. P. Psychometric methods. New York: McGraw-Hill Book Co., 
1936, pp. 423-4. 




















CALVIN W. TAYLOR 399 


An interesting side problem arises in connection with the above 
discussion. If the ratio of the time limit to the number of items 
varies, will the task in a particular type of test material vary in 
terms of psychological functions (factors) operating? If so, will the 
variation in the psychological nature of the task be found to exist 
for certain types of test materials and not for others, as this ratio 
varies ? 

In a second article by Horst involving test lengths, a technique 
is presented that is designed to simplify the process of combining 
weighted scores of the parts of a battery.* Horst imposed a restric- 
tion that the multiple R be some specified value, which value should 
be less than the multiple R would be if the lengths of all tests were 
increased to infinity. Then by varying the lengths of the tests in 
the battery in a prescribed fashion, the desired weighted score for 
each type of test could be obtained by counting the number of items 
correct in the corresponding new-length test. The weighted battery 
score would then merely be the total number of items correct in the 
entire battery. In his solution no direct restriction was placed on 
the time utilized by the final battery. 


IV. The Derivation 

A portion of the total amount of time available for a final test- 
ing program will be consumed by such activities as distributing test 
materials and giving the directions and practice problems for each 
type of test. Let the balance of the total time, the testing time prop- 
er, be designated by the letter t. 

Let the reliability, validity, and intercorrelation coefficients be 
known for the tests when each is transformed, or estimated as be- 
ing transformed, to ¢ lengths. Thus, an initial experimental test 
would be estimated as being changed m times both in item and in 
time length to form its otherwise parallel test of t length. 

Hereafter the tests under consideration will be these trans- — 
formed tests, each of t length, from which the final, optimal-length 
tests will be derived. \ 

Let the transformed tests be designated 1, 2, 3, ----: k, and 
the criterion be designated as 0. Let the fraction of the testing time 
proper, t, as allotted to each of the k types of tests in the final bat- 
tery, be m,, m2, Ns, ***> m,. Also, for a given type of test, let it be 


*Horst, Paul. Regression weights as a function of test length. Psycho- 
metrika, 1948, 13, 125-134. 
yThese new correlation coefficients can be estimated from the correlations 


red = experimental tests by using the general formula given in equa- 
ion ‘ 

















400 PSYCHOMETRIKA 


understood that when a change of length is discussed, the length in 
number of items is changed in the same proportion as the time 
length; i.e., the ratio of the number of items to the time limit for 


these items remains constant. 
Thus, the total testing time ¢ will be subdivided in the follow- 
ing manner in the final battery: 


Mt + Not + Nat + ----- +nt=—t. (8) 


Therefore, 


k 
(% = )—1=0. (9a) 

Let this function, which provides a conditional equation in the 
present problem, be designated as 


k 
f (M1, Ne, Ns, ++" m) =( Em )— =0. (9b) 

Let r;; = the correlation between scores in tests i and j, each 
test being of ¢ length, and let 

33 = Toon i = the correlation between scores in tests (nit) 
and (n;;), when test (7;z) is obtained by modifying test i from its 
length ¢ to the new length n;t , and test 7 is changed to n,;t length to 
form test (7;;). 

The correlation between tests (n;i) and (n;7) can be estimated 
from the following equation:* 


Ni N; Ti; (10) 





7’; = : = 
ij Tm, 4) (059) 








Vn; + (n?; — Ni) Tit V nN; + (n?; —N;) 73; 


In deriving the above formula and in using it in the present 
problem, the main assumptions made for each test are that a given 
test can be broken up into several equal-length parallel forms and 
that additional parallel forms of the same length can be constructed 
so that all of the parallel forms have equal standard deviations and 
equal intercorrelations and each will have the same correlation with 
any other given test of unit length. It is also assumed that when a 
test is subdivided or when subtests are combined to form new-length 
tests, other factors, such as the warming-up effect, fatigue, boredom, 


*See equation (111) and pages 191-6 and 203-5 in Peters, Charles C., and 
Van Voorhis, Walter R. Statistical procedures and their mathematical bases. 
New York: McGraw-Hill Book Co., Inc., 1940. Special cases which can be de- 
rived from this general equation include the Spearman-Brown formula, the index 
of reliability, the correction for attenuation, and equation (12). 








~~ _— 











CALVIN W. TAYLOR 401 


degree of familiarity with the test materials, etc., will not enter in 
to change the nature of the task from what it was in the initial test. 

Let 
m= [1+ (n,—1) ri, (11a) 


and 
aj= [1 + (n;—1) 74]. (11b) 


After dividing both the numerator and the denominator of equa- 


tion (10) by \/nin; and substituting, equation (10) takes the form 
presented in the practice problem: 


ny 1; 
ru=rs,| —. (6) 
Ni Nj; 
The special case of equation (6) for estimating the validity co- 
efficients for the new length tests is: 
Ty 
1" oj = Toj ieee (12) 
j 
A suitable form of the equation for multiple correlation for the 
present problem is* 


Ro? (123...-k) = hee (13) 
seek) A ’ 
00 


where A and Ao are symmetrical determinants of the following val- 
ues: 


| 1 Nos Tn 7 . . . + 1’ ok 
| 10 1 age . . ° ° . Y's 
| 120 ies 1 . . . ° ° Tok 
Az). . a (14) 
| 1 ko Tn T'ke ° ° ° . + 1 





and Ao. is obtained from A by crossing out the (0) row and the (0) 
column: 


*Kelley, Truman L. Statistical method. New York: The MacMillan Co., 
1924, equation 275, p. 301. 











PSYCHOMETRIKA 








il Vas . . ° ~ . "xk 
Viet 1 . . . . . T ok 

Ao =| ~ ‘ i ‘ (15) 
. eee 


In equation (13), in order that R will be real and meaningful 
with a value between +1 and 0, it is assumed that 4o. # 0, that 


A 
( — ) is zero or positive, and that 4o. 2 A. 
Let the general term for the cell entry in these determinants be 


ai;. When i #7, from equations (14) and (15) 





’ Nj N; 
yj — 7 a Ty oie (16) 
VO ni; 
and when 
i=), Qui=—1. (17) 
These are the values of the entries in the determinants A and 
Aw. Tee variables Gre m,, Me, Ms, 02s esees , m, so there are 


k variables in the problem. 
It can be seen from equation (13) that R will be a maximum 


A 
when ( — ) is a minimum, so the solution can be obtained by find- 


400 


ing the values of the variables that will yield a minimum value of 
A 
i 


The Lagrangian method of undetermined multipliers is appli- 





; A 
cable to this type of problem. The desired minimum value of ( ) 


00 





occurs at those values of 7, , %,M%3,..... ”m for which 
A 
(3, | 
, Goo of 
| +i— | =0, 
0 Ni 0 Nj 


where 4 is a Lagrangian multiplier and 7 varies from 1 to k, or 




















CALVIN W. TAYLOR 














r) 
_ =A a. (18) 
0 Nn; ON; 
From equation (9b), 
Of 
=1, (19) 
0 Nn; 
Also 
A 0A 0 Aco 
0 — Pare yada A i>. oa 
00 ON; On; 
= ; (20) 
0 Nn; A? oo 
Substituting these values in equations (18) and cross multiplying, 
8 Avo aA 
— Apo —— =A A oo. (21) 
a] Ni 0 Ni 


Let the following notations be used: 


Ay; = the determinant obtained from A by crossing out row 
h and column j; 


Aon.oj = the determinant obtained from Ao. by crossing out 
row h and column j. 


The partial derivative of a determinant with respect to an in- 
dependent variable may be found by first taking the product of the 
cofactor of an element and the partial derivative of that element and 
then by summing all of these products obtained from each element 
in the determinant.* Since the cofactor of an element equals the 
position sign of the element times the minor of that element, the for- 
mula for the partial derivative of the determinant 4, in terms of 
minors, takes the following form: 


. 





0A 0 (an;) 
—=5 5 (—1)™ Ay 
Oni oh ON; 


(22) 
Since A is symmetrical, since the diagonal element is unity, and 


since the partial derivative of any element a; is zero except when 


*Veblen, Oswald. Invariants of quadratic differential forms. Cambridge, 
England: The University Press, 1933, p. 8. 











404 PSYCHOMETRIKA 


the element contains n; , it can readily be shown that 


k ny 
a (—1) *4 Aj; 745 (1 — ris) | 


nN; nN; n3; 











Similarly, 


N; 








ON ja Nj Ni N*; 
j#i 


0 Zoo ad — 
=> (—1)** Aoi; run) | 


Substituting these values in equations (21), 


4 ae 

i Nj; 

AD (—1) *4 Agios TH (1 — ris) | - _ 

j=1 nN; NN; 
jzi 








k 
ed Nj ‘ 
— Ago >, (—1) *+ Ai; ri;(1—7rii) = ——— A Zins 
j=0 Nj NN; 


f#i 
From equation (16), 


Nj Ni 
Vij _ ee _ 
nN; Nj 


Substituting and factoring, 


I—r\ [my .¢ = 
———_} | — | AS 1) Gs; oso 
Vi W; at j=1 

JF 


k 
— Apo >> (—1) ** Qi; 4 | =f A*. 
jes 
Let 
k 
A’ => (—1) 9 as; Ai;, 
jat 
in which the diagonal term a;; would be zero. 
Substituting A’ and A’. in equations (27) gives 





1— TK ‘ : 
( = )(4 dee — Aan 4’) = 2 Aen 
Ni Ni 















(23) 


(24) 


(25) 


(26) 


(27) 


(28) 


(29) 


From equation (28) it can be seen that A and J’ are identical 
except for the term involving the diagonal. Expanding A and sum- 

















CALVIN W. TAYLOR 
ming all except the diagonal term gives 


k 
A= (—1) ‘+ ai4it+> (—1) ‘+ Qij Ai;. (30) 
jt 


But the last term above is equivalent to A’ in equation (28), so 





A=A— Au. (31) 
Similarly, 
A'o0 = Aoo — Aoi - (32) 
Substituting these values in equations (29) and canceling, 
Ck Zs | 
( z ) (Ae Aii— A Api.oi) =A Ao. (33) 
Ni Nj 


Bocher gives a corollary pertaining to determinants and minors 
of determinants which, when applied to the present problem, be- 
comes: * 


| (—1) °° Ago (—1)* Ags | 4 \ontsoes 7 
| (—1)* Ain = (—1) #4 Ais |= ¢ 1) (A) (Aoioi). (84) 


Expanding the determinant and solving for A’; , 
A? oi = Ago Ais — A Aoi-oi « (35) 


Substituting the above value into equations (33), 





1— ri; . ‘ 

( ae ) A*»,; =4A%0, Whereivariesfromitok. (86) 
i Ni 

Since the term on the right side is constant as i varies, it can be re- 

placed by one of the left hand terms for any particular variable, such 

as n,. Thus, 





1— Nii 1— T11 : < 
( ) A*oi = ( ——" ) A*;, Where? varies from 2 to k. (37) 


Ni 1; N, Ny 


After taking the square root, 


1 ‘nile Vii 1 ae Tes: 
Aoi —= + Ay ye (38) 
Ni Ni Ny, TN 


Equations (38) together with equation (9a), in general, pro- 
vide sufficient equations so that a solution can be obtained for each 


*Boécher, M. Introduction to higher algebra. New York: The Macmillan Co., 
1907, p. 38, corollary 3. 



















406 PSYCHOMETRIKA 
of the k variables. The sign in equations (38) should be selected to 
give fraction-of-total time values of the variables that are positive 
and real and that fit the case in any given problem. 

In solving a given problem by means of the present procedure, 
one must first transform the statistics for the initial tests to esti- 
mated statistics for equal-length tests, each of t length. After sub- 
stituting these estimated statistics into equations (38), it is neces- 
sary to undertake further computational steps to complete the solu- 
tion, as indicated in the simple practice problem, because the un- 
knowns are not solved for explicitly in equations (38) and (9a). 

For an initial experimental test transformed m times to make 
a t length test and then transformed mn times from t length to form 
the final optimal-length test, the final test is obtained by changing 
the initial test mn times in length. 


Manuscript received 12/20/48 
Revised manuscript received 12/10/49 











to 
Je 











PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


A NOTE ON OPTIMAL TEST LENGTH 


PAUL HORST 
UNIVERSITY OF WASHINGTON 


Elsewhere in this issue is an article by Dr. Calvin Taylor en- 
titled “Maximizing Predictive Efficiency for a Fixed Total Testing 
Time.” The purpose of this note is to point out that the formulation 
developed by Dr. Taylor is mathematically equivalent to one present- 
ed by me in a previous issue of this journal.* 

We start from Taylor’s equation (36) 


1 ee 
( —7) A*43 = AA*g9. [Taylor’s Eq. (36) ] 


Nj Ni 
If we let 
y =the correlation matrix of tests of length T, 
7’, =the vector validities of these tests, 


Ti; reliability of test 2 of length T, 
D, =a diagonal of the proportions of total time taken by 
the tests of altered length, 
D =a diagonal matrix whose typical element is 
Ni 
1+ (n,—1)7ri’ 
D, =a diagonal matrix whose typical element is 1 — 7;;, 
D,.. =a diagonal matrix of the 7;;, and 





Vv, = Dis or the vector of 7;’s, 


then Taylor’s equation (36) can be written in the matrix form: 


K1=D," D,} D' [D' (r— 1+ DD“) D*]“ D' r., (1) 
where K is a constant to be determined. 
But 
D=D,(D, + Ds D,,,)". (2) 


Hence from (1) and (2) we get 


*Horst, Paul. Determination of optimal test length to maximize the multiple 
correlation. Psychometrika, 1949, 14, 79-88. 


407 











PSYCHOMETRIKA 
re=K [r—1+ (Dy + Dy D,,,)Dy7*] Do D1, (3) 
which simplifies to 


Dwi [r —1 + Dy.) (re —K Dv 1) 


V,z= - ° (4) 


Remembering that 51; = 1 and letting S = (r—I + D,,.), we have 
from (4) 








I’ DJS*(re-— K D,' 1) 





’V,=—1= (5) 
K 
Solving (5) for K and substituting in (4) we get 
D3(1 + 1 DES D,'1)S“ r.— (1' D,' S84 7r.)S* D1 
V,.= ' (6) 


PDS S47, 


Eq. (6) then is the explicit solution for the n;. Equation (6) can be 
shown to be a special case of my method where my D, is taken as 
I T, i.e., all tests are assumed to be of length 7. 

My equations (26) and (32) can readily be reduced to the form 
(6) ; however, I believe their present form is computationally more 
convenient. 


Manuscript received 2/17/50 


























PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


PREDICTED DIFFERENCES AND DIFFERENCES 
BETWEEN PREDICTIONS* 


WILLIAM G. MOLLENKOPF 
EDUCATIONAL TESTING SERVICE 


When K tests are given to N individuals, and for each individ- 
ual there are two criterion measures, then (1) the multiple regres- 
sion weight to be applied to the standard score for each test to pre- 
dict the criterion-difference score equals the difference of the weights 
for predicting each criterion separately; (2) the difference between 
the predicted scores equals the predicted difference (each test being 
assigned the appropriate multiple regression weight); (3) the 
square of the multiple correlation between predicted and actual cri- 
terion-difference scores equals the sum of squares of the multiple 
correlations of the battery with each criterion less the product of 
these correlations and the correlation between predicted scores all 
divided by twice the quantity one minus the criterion intercorrela- 
tion; and (4) the variance of errors of estimating the criterion-dif- 
ference score equals the sum of the variances of errors of estimating 
each criterion score minus twice the criterion intercorrelation, plus 
twice the correlation between predicted scores multiplied by the 
product of the square root of one minus the variance of errors of 
estimating one criterion and the corresponding square root for the 
second criterion. 


Over twenty-five years ago Kelley (4) published a method for 
determining what proportion of the differences between scores on a 
pair of tests could not be accounted for by chance. Kelley’s method, 
was later utilized in a study by Segel (5). Much more recently Ben- 
nett and Doppelt (1) have again called attention to the method in 
the development of the Differential Aptitude Tests (3). The topic of 
differential prediction has also been discussed by Brogden (2) and by 
Thorndike (6, 7). The present article will give several derivations 
pertinent to differential prediction. 

Let us suppose that a battery of K tests has been administered 
to N individuals, and that for each of those individuals there are 
available measures in two criteria a and b. Let us suppose further 
that predictions of a and b are to be made by means of the usual 
multiple regression equation, with each test being assigned a weight 
for each criterion. The following questions are then raised. (1) How 

*The author wishes to express his appreciation for the suggestions and guid- 
ance given by Dr. Harold Gulliksen in the preparation of this article. He also 


wishes to acknowledge the helpful comments of Dr. Paul Horst and Dr. Led- 
yard Tucker on certain phases of the development. 


409 











410 PSYCHOMETRIKA 


are the multiple regression weights for the prediction of the differ- 
ence between the two criterion measures related to the weights for 
predicting each criterion separately? (2) How are the differences 
between the predicted scores in the two criteria related to the pre- 
dicted differences? (3) How is the multiple correlation between the 
predicted and actual differences related to the two multiple correla- 
tions with the pair of criteria? (4) How is the error of estimate 
involved in predicting the criterion-difference score from this bat- 
tery related to the errors of estimate involved in predicting each 
criterion from the battery? 
The following notation will be employed: 


=a subscript denoting an individual, 
j,k = subscripts denoting tests, 
B =a multiple regression weight, 
z =a test score in standard form, 
K =the number of tests, 
N =the number of individuals, 


a,b =the two criterion measures, each in standard 
form, 
d =the difference between a and b, and 


a*,b*,d* =the value of a, b, and d, respectively, as pre- 
dicted from the K tests by use of the multiple 
regression weights. 


Question 1 


For future reference we may indicate the equations for predict- 
ing a and 0 as follows: 


a; = Bar 211 + Bas Zoi + +++ + Bax 2xi, and (1) 
b*; = Bor 211 + Boo Zoi +++ + Box 2xi. (2) 
In matrix notation these become 
a*;=2i; Bja and (3) 
b*; = 2i; By. (4) 


The multiple regression weights involved in (3) and (4) may 
be stated in terms of covariances by the following derivation. 
Let 


Q= 3 (a—a's)*. (5) 


For a least squares solution, Q is to be minimized. 














WILLIAM G. MOLLENKOPF 


In matrix form, 
Q= (a— 2; Bia)’ (@— 2; Bia) 
=aa—p2a—azpt+p2zp. (6) 


By use of the partial derivative equated to zero, we find after 
simplification that 


2’ 2 Bia 2' a. (7) 
After multiplying both sides of (7) by N-, we then may write 
Cir Bia = Cya- 
Premultiplying both sides by Cj, we then have 
Bia = Cyn? Cia. (8) 
Similarly, 
Bin = Cy* Cir. (9) 
From the definition 
d,=ai—bi, (10) 
it follows that 
Cja= Cia — Cpr. (11) 


The equation, analogous to (3) and (4), for predicting d as 
defined by (10) is 


a"; = 2i; Bia, (12) 
and the multiple regression weights in (12) may be found from 
Bja = Cin? Cra (13) 


It is to be noticed that in using equation (12) one is predicting 
d directly from the set of K tests, each test being given the multiple 
regression weight shown in equation (13). 
Substituting from (11) in (13), we find 
Bia = Cy (Cia — Cin) 
= Cin Cha — Cit Cpr. (14) 
Hence, from (8) and (9), we have the relation among the three 
sets of regression weights, 


Bia = Bia — Bio. (15) 
This is the solution to the first of the questions proposed.* The 


*The result given in equation 15 is noted as being in agreement with the 
statement made by Thorndike (6, p. 223). 














412 PSYCHOMETRIKA 


multiple regression weight to be applied to standard scores on a par- 
ticular test for predicting the difference between the criteria is equal 
to the difference between the multiple regression weights for predict- 
ing each criterion separately, the weight assigned each test for each 
purpose being the multiple regression weight. 


Question 2 
From (12) and (13) we can write 
a’; = ij Cx Cus (16) 


By use of (11), (8), (9), (3), and (4), equation (16) can be 
transformed to 
a‘; = a’; = b*; ° (17) 
This is the answer to the second question which we proposed. 
The difference between the two predicted scores for the two criteria 
is the same as the predicted difference between scores for these two 
criteria, when each test is assigned the multiple regression weight 
in each instance. 


Question 3 


The relation between the covariance of d* with d and the corre- 
lation of d* with d is 


R..=0:C.. oa. (18) 


d*d ou d*d d 


Let us now write C,,, in terms of other covariances. 


1 1 
C,,,= 5 Ue =a (a —b') 


d*d N 
1 
== [dat —d 0) 
== [(a—b)' at — (a—b)' b° 
oat Hi a a* — (a—b)’ b*) 
pai i + iam ign Cau iis 6 (19) 


Consider the last two covariances on the right side of (19). 

















WILLIAM G. MOLLENKOPF 413 


1 1 
C Saye b= Hy (zis Bie) bi 


a*b N 
= Bia Cr 
= B’ja [Cin Cit] Cro 
a 
oan N B' ja sy 2ik Bro 
atta 1 * 2 annie 20 
ee ee, (20) 


Similarily it can be shown that 
CG: (21) 


a a*b* 
Hence we may rewrite (19) as 
gg ig tig FE yy. (22) 


Next o «may be re-expressed by use of the formula for the stand- 
ard error of the difference between two correlated quantities. 





o, eye Te 2 ee (23) 


* b* a*b* 


By use of the same formula, we can write 





Og — Vora + 07%) — Z Ta Fb Tad » (24) 
which reduces, since o, and o, were taken to be unity, to 
oa= V2(1—1 a). (25) 


Let us now consider the terms o*, and o°, . 


ee, == (2:3 Bja)’ (2i; Bja) 


= B'ja Cin Bra 
= C'ja Cin? Cir Bra 
= C'ja Bia» (26) 
Since 
Tjq = 077 Cja on", (27) 


and we have taken 

















414 PSYCHOMETRIKA 


oj—1 and o—1, 
we may write 
c= os Bie (28) 


a* 


The term on the right-hand side of (28) is recognized as an 
expression for the square of the multiple correlation coefficient. 
Hence 

o”? = FR and (29) 


a* a*a’ 


o =—R (30) 


° 
a* a*a 


Similarly it can be shown that 


o,,—R,.,- (31) 
It is to be noted that : 
i es (32) 
Hence, from (30), it is evident that 
C  =R*,. (33) 
Similarly i = 
C.., = 8: (34) 


The covariance term C uy. May be re-expressed as follows, using 


(80) and (31): 


b* 


a ado T 0ye%e -_ a + nae B es . (35) 
Consequently, by using (18), (28), (25), (30), (31), (33), 
(34), and (35), we can re-write (22) as 


VE? +R, — 2k, Rk? 











a*a b*b ata = b*b_s aa*b* 
Rea wail > ? (36) 
V2 (1 cy Tab) 
or 
om - Fie fs 2k en? cee 
io : (37) 


adit 2(1— rev) 
This result is the answer to the third question proposed.* It can 


*The reader who compares equation (37) with equation (16) in Chapter 7 
of R. L. Thorndike’s Personnel Selection may wonder at the lack of correspond- 
ence over and above the difference in notation. The writer believes Thorndike’s 
equation to be in error because of his assumption that both the actual criterion 
measures and the predicted criterion scores have variances of unity. If A is a 
predicted criterion score and a is the actual criterion score, then r,, is a mul- 

















WILLIAM G. MOLLENKOPF 415 


be seen that the multiple correlation with the difference between the 
criterion measures for any given correlation between criteria (ex- 
cept +1) increases with increasing validity for each criterion or both 
criteria for a given a and decreases as the correlation between 


the two predicted scores Woo) increases, for given validities. 
For purposes of examining (37) further, let us suppose that the 


two validities R , and #,,, are equal. Then we may write 
—. 2-27 


a*tb* 
~~ f-—~) sinha 


From (38), it is clear that RB .g Can be large even when r , is high, 
provided r _,,, can be made small and the validities R,,, and F,,, are 
substantial. In attempts to secure a high value for R a not only 
will high validity in the predictors be desirable, but also it will be 
of the utmost importance to secure predictor scores such that their 
intercorrelation makes the value of the quantity (1—7,,, ) as large 
as possible. This quantity will be greatest for an 7, of negative 
unity, and decrease as his increases from negative unity to positive 
unity. 

In fact, it may accurately be stated that in determining R ie 
the magnitude of 7 ,, will be of critical importance. Thus it can be 


seen from (88) that so long as 7» is not unity, the multiple corre- 
lation with the criterion-difference score will exceed those with the 
two criteria as the intercorrelation between predictors becomes low- 
er in algebraic value than the criterion intercorrelation. It is also 
clear from (37) and (88) that for the purpose of comparing test 
batteries in effectiveness of differential prediction for a given pair 
of criteria, 7,, need not be known. 

Some worked-out numerical examples of the application of (37) 
may throw further light on the problem. In each example, the value 
of the criterion intercorrelation is taken at .50, which makes the de- 
nominator of (37) unity. 








tiple correlation, and clearly both A and a may not have variances of unity 


unless the correlation is perfect. Also, the equations differ in that Thorndike 
has not carried out simplifications of his correlation terms r,g and rg, both of 


which may be re-stated as in the present development. 



























416 PSYCHOMETRIKA 


Example 1. Test Criterion k= 30 
1 2 a b &.. = 

1 1.00 .00 .380 = .00 r= 0 

00 1.00 .00- .30 k= 

Example 2. Test Criterion R = .38 
1 2 a b R= .38 


1 1.00 .60 30 .00 ae 

2 .60 1.00 .00 .30 r= ae 

Example 3. Test Criterion R= 82 
1 2 a b R., = +82 

1 100 .00 .30 = .10 + 

2 .00 1.00 .10~ .30 k..,— -28 

Example 4. Test Criterion a =. 
1 2 a b R..,— -42 

1 100 00 2 .30 1 = bee 

2 .00 1.00 .30 .30 R.,— :00 

Example 5. Test Criterion oe = .b8 
1 2 a b E., — 50 

1 1.00 .00 .30 .00 T we 86 

2 .00 1.00 .50  .50 R= _ .30 


d*d 


Question 4 
Using the usual expression, the square of the standard error 
of estimate for the criterion difference scores may be stated 





4 = o (1— FR .®) A (39) 
Now it is clear that 
oe 
we (40) 
so equation (39) may be written 
2= o* — a. (41) 


By using (25), (23), (30), and (31), equation (39) may also 
be written 








or 


) 





WILLIAM G. MOLLENKOPF 417 


ri‘ a* = =2 (1 in rs ») mer roe al Pig + 2k Tao b*b (42) 


Equation (42) may also be written to include expressions for 
the errors of estimating a and b, as follows: 


Jee ito.) + 3V 1— ov ai? an v1— a Soa a (48) 
For the case in which ee = Ps ; ead (42) becomes 
¢ Jee tis, ja 2it 1,1. (44) 
Under the same assumption, equation (43) becomes 
o, f=2le 2—r i) t ute] (45) 
or 
(ee Se Oe (46) 


Equation (438) provides an answer to the fourth of the questions 
proposed at the beginning of this paper, with equation (42) giving 
an alternative expression. The variance of the errors of estimating 
the criterion-difference score decreases with increasing validity in 
each of the predictor scores, with a given criterion intercorrelation 
and given predictor-score intercorrelation. On the other hand, for 
given predictor validities and given criterion intercorrelation, the 
error of estimating the criterion-difference score decreases as the 
correlation of the two predictions decreases. 


REFERENCES 
1. Bennett, G. K., and Doppelt, J. E. The evaluation of pairs of tests for 
guidance use. Ed. psychol. Meas., 1948, 8, 319-325. 
Brogden, H. E. An approach to the problem of differential prediction. Psy- 
chometrika, 1946, 11, 189-154. 
3. Differential Aptitude Tests (Manual and eight me New York: The Psy- 
chological Corporation, 1947. 
Kelley, T. L. A new method for determining the iii of differences 
in intelligence and achievement test scores. J. ed. Psychol., 1923, 14, 321-888. 
Segel, D. Differential diagnosis. Baltimore: Warwick and York, 1934. 
Thorndike, R. L. Personnel selection. New York: John Wiley, 1949. 
Thorndike, R. L. The problem of classification of personnel. Psychometrika, 
1950, 15, 215-2385. 


Manuscript received 1/4/50 


tr 


~ 


Ss 














PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


DETERMINATION OF THE OPTIMUM NUMBER OF ITEMS TO 
RETAIN IN A TEST MEASURING A SINGLE ABILITY 


B. J. BEDELL 
COLONIAL SOCIAL SCIENCE RESEARCH LABORATORY 
TRINIDAD, B.W.I. 


It should be possible to make a test a more accurate measuring 
instrument by discarding some of the items which have been found, 
by item analysis, to be inefficient. If this is so, fresh and more reli- 
able scores from the already marked scripts would be obtained by 
neglecting the discarded items, and, in the future, the abbreviated 
test would be used. The present article offers two solutions, based 
on slightly different premises, to the problem of which items to dis- 
card when the test has been designed to measure a single ability. 
In a practical example these yield almost the same result, which 
is also the result obtained by a less rigorous grouping method de- 
scribed, which is by no means laborious to use. 


This is an attempt at a solutuion of a problem which has cropped 
up a number of times during the construction of a battery of tests 
each aimed at measuring a single ability. After item analysis had 
been carried out yielding “efficiency coefficients” which are taken 
here as rough estimates of the correlations of each item with the 
sum of all, and “facility values” which are the probabilities of items 
being answered correctly, it was necessary to decide which items to 
retain. It is a frequent habit among experimenters to discard items 
of facility greater than .8 or less than .2, and then further to dis- 
card those items whose efficiency values are less than some arbitrary 
figure. Where many more items have been made than are required 
in the test, it is possible to choose a comparatively high value for 
the minimum efficiency coefficient allowed. But it has been found in 
practice, with some tests devised to measure single abilities, very dif- 
ficult to devise many more items than are required in the test. If too 
many items are discarded, the reliability will fall. This is obvious 
if one considers the case of discarding all but the most efficient item. 
If too few are discarded the reliability will again fall as items with 
little power of measuring what the sum of all are intended to meas- 
ure will dilute the measuring power of the efficient items. This is 
obvious if we consider admitting items with zero efficiency. Of course 
if one were able and willing to weight each item with its regression 


419 











420 PSYCHOMETRIKA 


coefficient one would have the best estimate of the total score, but 
this is not practicable. In practice one must weight each item 1 or 0. 
This necessary but crude weighting, inasmuch as it differs from 
weighting by regression coefficients, must involve errors. The pres- 
ent purpose is to determine which items to weight 1 so as to produce 
the best measure of the ability the items are designed to measure. 

Suppose an item analysis has been made and that items too diffi- 
cult (p < .2) or too easy (p > .8) have been discarded and that the 
efficiency coefficients, e;, which are crude, but for the present dis- 
cussion, taken as true measures of the correlations of individual items 
with the sum of all, are arranged so that e, > @. > @3 +++ > x +++ > en 
(where the items are 1 --- nm and where é; = pj,1+2+...1n)- 

To find the best value of k, the number of items to be retained 
in the test, one must select a criterion. The criteria here considered 
are the following: 


1 1+2+.--+ k (A fallible criterion). 


2. The intrinsic factor y accounting for all correlations within 
the test or hypothetical similar forms of it. (This is an infallible 
criterion). 


1:1+2+---+ kas criterion 


This means that we are to find the value of k giving the great- 
est reliability coefficient, i.e. prso-..ikr+0+.4x greatest.* This criterion 
is the same as that used in evaluating first factor loadings with cen- 
troid analysis. 

On the assumption that correlations between items are due en- 
tirely to the criterion we may take values ¢; (7 =1--+ k) = pjjrsose 
to represent the loadings of the items. From these it is possible to 
construct a correlation matrix P which will be of unit rank with ele- 
ments eje;. Using Spearman’s theorem that if 3 is solely responsible 
for a correlation between 1 and 2, then 


T12 = 113 23-7 (1) 


*This seems better than finding k for 71,,, .44401...4n greatest, as this 
would yield a test of k items correlating most highly with a criterion contain- 
ing discarded items. 


{From the well known formula for partial correlation, 


Tyo — N13 To3 
i523 : . 
Denominator 


But when 3 is kept constant, on the assumption that 3 is solely responsible for 
the correlation between 1 and 2, 7,,, = 0. 





le 











B. J. BEDELL 


So that making further use of the assumption 
Pj ,1t2t---+m — Pj1t2teth P1t2t---4k,1424---40 9 
or writing px, for the last factor in the above, 
Cj = €; prn- (2) 


This is the connection between the observed efficiency values and 
those which would be obtained after selection. (It may be noted here 
that Greek letters are used where correlations are based on the e’s). 
From the correlation matrix P a pooling square may be constructed 
for the n items of the test and the n’ items of a similar form. From 
this 


k 
(Soje;)* 
P1t2+e+- +h, 1424-0-+k = Pkk = k k k r (3) 
(Soje;)? — > (oje;)? + Seo; 
(where o; is the 8.D. of item 7). 
If we assume that the values of the S.D.’s of items are uncorrelated 
with the values of e, it may be proved that} 





k k 
(> o; €;)? = Mo? (Se;)? (where Mz is the mean of «;), 


k k 
> (oj)? = MD 8; (where M_,is the mean of o*;), 


k 
*For (=o,¢;)? is the sum of the elements in the N.E. quadrant and this 


with 0,2 replacing the principal diagonal elements is the sum of the elements in 
the N.W. or S.E. quadrants. 


For if there is no correlation, 
k 


=e; 
3M) a) alae si =0. 


k 


= (¢ 


(This is truly so if k — © and only approximately so in samples.) 
Whence, 








k k 

k Ze; k k Ze; 

bo > v — 

Zo,e;— ; 2o,—M,g=e;+kM, ' =O 
or, 

k k k 

s = s b A —_— 
2o,2e;—ZDe;Mg— o~&; +M,>e;=0, 
ie., 

k k 

sy — 

2o,8;—=M,=e;. 


Writing 0,2, @;2, M _,, for 7;, &;, Mg, the other relation is proved. 








422 PSYCHOMETRIKA 
so that (3) becomes 


k 
M?o(> €;)? 
ee - (4) 
M.(> 6;)?—M > e*; + kM, 





But it can further be shown* that M?. is very nearly equal to M 
within the limits of our facility values. Hence (4) reduces to 








k 
(> e;)? 
Prk = — k . (5) 
(Se)? Seth 
Substituting the value of «; given in (2) this becomes 
k 
(> e;)? 
Pkk = —,. . (6) 


k é 
(> e;)? —Z e*; + k& pen 


*If p is the probability of an item being answered correctly and q is the 
probability of its not being answered correctly, assuming a binomial distribu- 
tion, 


o? = pq= p(l1— p) = p— p* 


and 

n n n 

2 o2—Zp—zp? 
or 

M =M,—M, = 
Also, 


M2, = (= Vp — p?)2/n2. 
If p be taken to have the values .2, .8, .4, .5, .6, .7, .8, M os works out to be .2100 
and M , to be .2084. Or if p be taken to have an infinite number of values be- 


tween .2 and .8, 
[ (p — p?) 








to == 2200 
o 8 —.2 
and 
8 \2 
f. Yea—P* 
8 
= 


8 — .2 


8 2 
= (| %(p—%)Vp—p?+ sin1(2p—1) | /(8— 2) = 2ie2.. 
! .2 





(4) 


og? 


(5) 


(6) 


the 
ibu- 


100 





B. J. BEDELL 423 


It is required now to obtain p%,. From a pooling square in 
which the columns and rows of the n'—k items are given zero weights, 
we may write 

k 


Te Se, 





Pkn ? 





V a> n= He; + CS e)?—3 e; + n} 


which, upon substituting the value of «; given in (2), becomes 


k n 
> 6; de 





pre 





.— ; ; : : 
V {(= e;)? = 5 e?; +k pen} { (> e;)? —> e?; +n p*kn} 
k k 
For brevity let (> e;)? — > e?; be called A, 


and (Se;)?—De?; becalledA,. 


Then the last equation may be written 
k n 
> 6; d ej 
Pin = . (7), 
V (A; +k p*kn) (A, +n p*kn) 


Squaring and cross-multiplying this yields 








k n 
n p°kn ss (kA, + NA) pen » AxAn p°kn AER (> ej = e;)? =0. (8) 


This is a cubic equation in p%,, an approximate solution to which 
can be obtained by noting that the first term is small compared with 
the others. Neglecting this term gives a quadratic equation in pkn, 
the positive root of which is a first approximation to the required 
root of (8). A closer approximation may be obtained by using New- 
ton’s method of approximation. 

The value of p*xn is then to be substituted in (6) which may now 
be written 


k 
_ (aa 
~ + Til (9) 
The value of & giving the highest value to py is thus determined. 
The first eight columns of Table 1 show part of the working 
for a Geometrical Doesn’t Belong Test. The 24 items have facility 
values ranging from .82 to .19. It must be admitted that these were 











424 PSYCHOMETRIKA 


not quite uniformly distributed, but the more efficient items tended 
to be the easier ones. Table 2 shows the rest of the working for those 
values of k near the maximum of p,, (the solutions of the cubic equa- 
tions are not given). By interpolation* the value of k for maximum 
pxx Was found to be 16.77, giving by linear interpolation é16.:; = .33. 
Thus, items whose efficiency coefficients are as low as .33 are to be 
retained in the test. If 17 items are retained, the reliability of the 
test would be .8800. This is, of course, always bearing in mind the 
assumption of a matrix of unit rank. It will be shown later that the 
Spearman-Brown formula for the reliability of lengthened tests, the 
proof of which is sometimes made to depend on the correlations in 
the matrix being all equal, still applies. 


In the 9th column of Table 1 are values of 7. computed simply 


from 
k 
(> e;)? 


A.tk’ 





(10) 


mee 


which is the same as (5) except that « and p have been replaced by 
ry and e. This means that the correlation matrix R has been con- 
structed using the observed e’s as loadings in the criterion, 1 + 2 + 
--» + n, instead of the «’s as loadings in the criterion, 1 + 2 + --- 
+ k. The value of e giving a maximum value to rx is found to be 


*To obtain the value of «x making y — f(x) a maximum, being given the 
coordinates (x,y,), (*,y,), (%,y,) of points near the maximum value of y, the 
value of «, (X), making y a maximum for the parabola passing through the 
given points may be determined from 


A(x, + %) —ec(x, + #5) 
x= 











2(A—c) 
L,— 4, —y 
where \ = — S ideote * 
t,—%, Y.— Ys 
In the present instance 
k Pur e; k, + k,== 88 
— k, + k. = 3: 
1 16 .8797 36 ‘ ts fe.3 . 
2 17  .8800 82 . ieee a 
3 8792 ie coe - 
18 879 27 Y, — Y= -.0008 
Y, — Ys, = -.0005 c—-.6 


From which X = 16.77, and e,, ,, = .33 by linear interpolation. 



































t+ @O Su 


— a ae. a ee 











B. J. BEDELL 425 


3138 , and still 17 items would be included in the test. 117,17 is not 
theoretically so accurate a measure of the reliability as p:;,.; and, as 
would be expected is less than p17, since the e’s are what would be 
obtained as efficiency values on a second item analysis if we were to 
use only 17 items. These e’s would be greater than the original e’s, 
making the correlations of the matrix greater than those of the ma- 
trix R. 

On the basis of this practical example, it would certainly not 
seem worth while using the laborious method based on equations (6) 
and (9) involving the solution of cubic equations, especially having 
regard to the crudeness of the values obtained in practice. However, 
an even greater saving of labour than that of the method of equation 
(10) is now to be described, the Method of Grouping. 

If the correlation matrix R were duplicated N times, a matrix 
of order nN X nN would result. From the pooling square thus able 
to be constructed it follows that 


k 
N? M? (3e;)? 
Vek ’ 


k k 
N? M? (Se;)?@—-NM, Sey +kNM, 





which, upon taking into consideration the fact that M*. ~ M , re- 
duces to 


k 
N(% e;)* 





(11)* 


Vix — r " ° 
N(de))?-—Tej; tk 
*Eq. (11) may be used for showing the Spearman-Brown formula for reli- 
ability when the test is lengthened still applies, though the correlations in the 
matrix are not equal. Applying (11) to (6) we obtain 
k 











an N(= e;)? 
Pre = k k 
N(= e,)?—= ee; + k Prin 
for all values of N. Let p,, = py when N = 1 and = py, y, when N= N. 
k k 
Writing B = (= e;)? and C for — = e?; + k p*,, the equation becomes 
eo when N=N 
ye yw eee when == 
Nk,Nk NB +4 Cc’ , 
and 
: hen N=1 
a —— when N = 1. 
kk = 3B 7c’ 
Eliminating C between these we get 
‘ — N Px 
Nk,Nk — , 
1+ (N— 1) Py, 


the Spearman-Brown formula. 














426 PSYCHOMETRIKA 


Taking the same example as before, we may write down our 
efficiency coefficients in descending order, divide them into n=6 
groups of N—4, and take the means of these six groups as our e’s. 
Thus, the mean of .68, .64, .64, and .55 is 62.75 =e, . 

Table 3 shows the working. It is seen by interpolation that the 
value of e for a maximum value of 7, is .8325 — very little different 
from what was obtained before. This calculation takes less than 25 
minutes if performed on a good calculating machine and less again 
if one is only concerned with obtaining the number of items to re- 
tain. It seemed necessary to establish the best solution to the prob- 
lem first before one could recommend more approximate and abbrevi- 
ated methods. There would seem to be no doubt that this last method 
is the one to use for practical purposes. 

Table 4 shows the results of some fictitious examples for n = 9 
and N — 6. (Total number of items = 54). Only those values of 7 
near the maximum were worked out. The values of e for maximum 
Tx , Obtained by finding k for maximum rz, by interpolation as de- 
scribed and then e for this value by linear interpolation, are in paren- 
theses. The values in parentheses are the lowest values of the effici- 
ency coefficients to be used. It will be noted that when the greatest 
efficiences are large, only a few items are to be retained; but when 
they are low, more items should be retained for greatest reliability. 


2. The intrinsic and infallible factor y accounting for all the corre- 
lations within the test or hypothetical similar 
forms of it, as criterion 

Here let the intrinsic correlation between an item and y be 
pj-y = €. pj; is the correlation of an item on one form of the test 
with an item of the same efficiency value on another form. If y is 
the only cause of these being correlated, then the reliability of 7, pj; 
= the communality of 7 = «?;. Then by (1) 


Pjji+2+---+m —— Ej Pise+---tn,y 


or, for brevity 


Pin = Ej Pay (12) 
Also by (1) 
Pnn —_— pny . (13) 
Hence, 
Pin = &) V pan» (14) 


From the attenuation equation, 
Ti2 


ons 9 
V pu P22 





(15) 


Ps 







































es a a 








or, 


T nn — p*nn - 


Eq. (14) on applying (15) and (16), becomes 


ie., 


Also, 


—_— 


Tin 





&j V pan Pnn 








ej 
2, = te 
VT nn 
Vij ‘ij 
8 &j = pig = ae sae 


V pit pis 84 5 


Hence the following relations are established: 


% e; 7; ——— 6; &; = p*ij- 


nn 


Tnn 
=e; [—. (where jn = 7j,142+...4n = €;) 





(16) 


(17) 


(18) 


The correlation matrix R of what would be observed correlations 
if they were ever worked out, may be considered to be formed 


ej 
from loadings —— , giving, as in (18), 
VT an 
1 
43 = 6; Cj. 
Tran 


Therefore, from the corresponding pooling square, 


giving, 


(> e;)? 
Vi A totes 
An + 1% an 


NP nn + A, Tan — (> e;)?=0 


whence 7», may be determined. 


It is required to find the value of k for which 


Pitat---+k,y = Pky 


(19) 








428 » PSYCHOMETRIKA 


is a maximum; or, as is here more convenient, for which p*:y is a 
maximum. By (15), 


2 Tix 
=f 
Pkk 
Hence 
Tx = p*kx » (20) 
And, by the pooling square, 
k 
(> e;)?. 
ty = 1, = ————__, 21 
ve a = 


from which, knowing 7, from (19), the values of p*xy may be de- 
termined and the value of k for this to be a maximum ascertained. 

The last three columns of Table 1 show the working for p‘:y. 
Tnn Was worked out from (19), which here became 


24 £2 un + 82.2904 rrn — 86.6761 = 0, 
a positive root of which is 
Tan = 845033 . 


The value of & for p*:y a maximum is 16.86, giving the corre- 
sponding value of é,¢.s6 = .8256 — still very close to the values pre- 
viously obtained. There would seem to be no good reason, therefore, 
to use any but the Grouping Method in practice. 


Manuscript received 12/28/49 
















° 441d WMNUITXBUL 34} 0} BuIpuodsari0) fa JO AN[RA AU.Ly 





ZILY'ZOL «808202 ¥062'28 LE8S'F «© I9L9'9B. = GL'ZTGL =“ TOLO'98~—=séaT 6 1% 
Z9SL'TOL —«-SSEF'6T ¥06Z'28 = LESS'h «=—sTQL9'98 =—s«GL'STSL =“ TOLO'98~—sédT 6 g 
TI88°00T  —-L069°8T 1067'28 «= LSS «= CTQLO'9B = GL'ZTGL «=s-:* TOLO'98 ~—séaT 'G 2 
S6FL'96 LSPL'LT 9800°6L seas  69gs's8  so'szzL  e000'S8  &EI'6 1Z 
$289'26 L006'9T QISL'SL  GOZ8F  SZ0TOS  LEZr6E9 SPze'E G68 02 
09Z8°L8 98¢0°9T POLLIL 08927  ¥880°9L  TL'06S9 ZEST'I8 L's 61 
O8TF'Z8 9012°ST FLOVLO IS6T'F SZOrTL 068819  G699°8L P's gI 
SSST'LL 999¢'FT Z06L'29 Zecl'h 2I699 IL'66LG 8SST'9L 88 cas LI 
« (99Z8") 
2082 TL ¢0zg' eT S6SL'LG S8610%  96LL'T9 TSPSES  99LT'SL 98'L 
e980's9 GGL9°21 8682S 3068'S 009299  Ss'cLer  O0Szs'69 09°L 
G6F0'6S cogs TT O6IZ LE 909° 96L60S TIL'SIPh PELP'99 FTL 
8ZZ8'SS 7986'0T PLESZh = C«OTS9°S «=—s«F89GSH «=: SPREE = BTZT'E9 ~— 8L'9 
PHSSLY POFT'OT OFITLE  629FE  G69L9°0F  SOLTSE LPOs6S LE'9 
Z2SS' TF 79626 89Zz'ZS 4s: BPEV'S ~=—s«TZG"SS = ssLB'BLOS ~=s«LBP'SS ~—96"g 
TSTL'SS S0SF's SL9V LZ =—s«eze0's «=—s«édOQSOS «=s«OSTE9ST~=Ss«édSGOZIG~—sCIg"g 
T6TS'08 eS09'L SSIL'ZS =: 8688'S ~=—s«980N'SS ~ —SETZS]_~=sC«NROT'LE =: 900° 
TZPS'SS S09L'9 SI8h'SI  z8L9% OO9TTS  LOPEST  09Z8zr 09°F 
0L62°0Z ZS16'S SISSrI zZ8zrs OOTS9T  SO'LShT OTLT'SE OTP 
0298°ST Z0L0°S SISL'OL  Z8LT'S O096ZT Ze'EzIT o9Tses 9's 
ZZOL'TT ZSZ0'h OLLY'L 9988'T 98986 09'TTS 988F'8z 90'S 
1960°8 T08¢"s O9TL'F TrsgT T0089 L0°9%9 ts9s'ez sg" 
T960°¢ TS8¢"2 0099°z 9I8zt  91b8's LOSES 9LbZ'8I —-96'T 
g099°Z T069°T FOLS" OZL8 sé HSPLT Z0'TST 26872. ZE'T 
0SF8" ost" 0000°0 74:) A 7) 2 80°0F 308e"9 89° 
agp 7" 
Q 4 4 


Dm OM-OH SO a 
rN oD tH 33S 





we 


“A “w 


u 4 4 


4 £ £ 4 £ 
Lf ly ty ‘2am e(‘ez) = ("og’eg) )= fagfag = fag 
u 4 








4SoJ, Suopeg },use0oq [eod1ajouosr woyzI-pz B IOZ JooysyIo AM 
T OTaVL 





















PSYCHOMETRIKA 


TABLE 2 
Worksheet for Values of k near the Maximum p,, in the 
Geometrical Doesn’t Belong Test 














k ej Prxn kp? xn Ay +kpxn Pr 





14 36 -77816 10.8942 58.1132 8772 


16 36 -77905 12.4648 70.2246 8797 
(.33) * 

a7 ms) -77899 13.2428 76.0330 8800 

18 27 -77818 14.0072 81.2146 8792 

19 27 -77760 14.7744 86.5448 8786 





*The value of ej; corresponding to the maximum fxr. 


TABLE 3 
Worksheet for the Geometrical Doesn’t Belong Test Using Six 
Groups of Four Items in Each 























































n=, 04) 
k k k k k k k k 

ke, Ze, (Ze,)2 N(Ze,)? ez, N(2e,;)2—Ze?, N(Ze;)2—Ze?, +k Te 

1 .6275 .6275 .89388 1.5752 .38938 1.1814 2.1814 .7221 

2 .5225 1.1500 1.3225 5.2900 .6668 4.6232 6.6232 -7987 

8 .4425 1.5925 2.5361 10.1444 .8626 9.2818 12.2818 8260 

4 .3725 1.9650 3.8612 15.4448 1.0013 14.4435 18.4485 8374 

(.8325) * 
5 .2725 2.2875 5.0064 20.0256 1.0756 18.9500 23.9500 8361 
6 .0900 2.3275 5.4173 21.6692 1.0837 20.5755 26.5755 8154 
*The value of €; corresponding to the maximum fk. 
TABLE 4 
Results for Fictitious Examples Involving Nine Groups of Six Items in Each 
(29 2V¥ =) 

e; Tie e; The e; Tr, e; T 3, e; Ty 8 
-9000 .9624 8 .9143 -7000 8522 .600 .5000 40 
-7875 9677 sf .9395 .6125 9011 .025 A375 130 
(.6950) * 

.6750 .9678 6 .9460 -5250 9159 .450 8750 30 
(.506) * 
.5625 -9662 5) 9472 A375 .9209 Bs ¥ 5) 8831 8125 .8267 .25 
(.8938) * (.302) * 
-4500 A 9457 .3500 .9210 .300 8854 .2500 .8320 .20 
(.2350) * (.18)* 
0375 3 -2625 .9179 225 .8828 .1875 8301 15 
.2250 2 .1750 .150 .8759 .1250 10 
1125 » .0875 .075 .0625 .05 
.0000 0 .0000 .000 .0000 . = 














*The value of ¢; corresponding to the maximum rx. 








The 
1221 
1987 


3260 
374 


361 
154 























PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


A COMPARISON OF TWO PROCEDURES FOR CALCULATING 
DISCRIMINANT FUNCTION COEFFICIENTS 


JOHN SCHMID, JR. 
MICHIGAN STATE COLLEGE 


Alternative procedures for calculating discriminant function 
coefficients have been illustrated in reported research. One method 
proceeds from data which has been expressed in deviation score 
units, whereas the other method implies that the data has been ex- 
pressed in standard units. It is shown algebraically that both proce- 
dures yield identical discriminant function coefficients, and therefore, 
the method involving less mathematical manipulation and computa- 
tion is preferable. 


In recent years the discriminant function developed by Fisher 
(2) has been increasingly applied to problems involving the statisti- 
cal analysis of group differences. Essentially, the discriminant func- 
tion furnishes a set of weights by which the measurements of prop- 
erties of two groups of individuals may be linearly compounded in 
order to effect maximum discrimination between the two groups. The 
mathematical formulation of the function may be found in references 
(3-4. 3). 

Two procedures have been followed in computing discriminant 
function coefficients. One procedure, illustrated by Garrett (3), pro- 
ceeds directly from the within-groups sums of squares and products 
of the measurements of the two groups. Subsequently, in this article, 
this procedure will be designated as Method i. Another procedure 
for obtaining discriminant function coefficients has been illustrated 
by Cox and Martin (1) and Johnson (5). In this method, hereafter 
designated as Method II, the within-groups sums of squares and 
products are reduced to within-groups correlations, and the discrimi- 
nant coefficients are then obtained from these statistics. 

Frequently an investigator desires to establish a cutting score 
for the compound by which an individual for whom measurements 
are available may be classified into one of the two groups, or an in- 
vestigator may want to determine the relative importance of the vari- 
ates in the compound. The investigator may doubt, however, that 
coefficients yielded by Method I permit drawing inferences relative 
to these two problems because the within-groups sums of squares 


431 











432 PSYCHOMETRIKA 


and products, being computed from deviation scores, apparently have 
not taken into account likely differences in variability of the several 
measurements. It is the purpose of this article to show that both 
methods yield identical coefficients, and consequently Method I is 
more expeditious in terms of time and computation as well as being 
less mathematically involved. A step-by-step comparison of the two 
methods follows. 


Method I: 


1. The within-groups sums of squares and products for the two 
groups are calculated. These values may be put in matrix notation 


P Su Si es es Sw 7 
“a ees. 7% 

sii stm eek no fia 
c arene 








where S;; is the within-groups sum of squares or products for meas- 
urements i and 7. 
It has been shown (3, 5) that 


Si=d, (2) 


; i, | 


where 4 = : a and d= 


7 ja 


Here, the /’s represent the discriminant coefficients and the d’s rep- 
resent the mean differences of the measurements in the two groups. 
2. The inverse of S is computed. 
3. The matrix multiplication S-'d is performed, yielding the 
discriminant coefficients as indicated by 


A=S-d. (3) 


Method II: 


1. The within-groups sums of squares and products S are com- 
puted as in Method I. 

2. Let P be the matrix of the reciprocal square roots of the 
principal diagonal elements in S: 
























ave 
ral 
oth 


ing 
wo 


VO 


S- 











JOHN SCHMID, JR. 








; ae 0 
Sn 
0 0 
a Sot a e. (4) 
1 
0 . ° ° a 
‘ ih 








8. The within-groups sums of squares and products are re- 
duced to within-groups intercorrelations by pre- and postmultiply- 
ing S by P as indicated in the operation 


R=PSP, (5) 
, eta one! ola 
To1 1 : ee eee 
ee ae. eae rmmake 








Tn 1 Tre ° ° ° e Trn | 


where 7;; = the within groups correlation. 
4, Johnson (5) has shown that if a column matrix L is defined 
by the equation 


L=Pj, (6) 
then a set of normal equations exists having the form 
RL=Pd. (7) 


5. The inverse of R is computed and L is obtained by the ma- 
trix multiplication 


L=R"9Pd. (8) 
6. The /’s are then computed by the operation 
A=PL. (9) 


It may be seen that this 4 matrix is identical with the one se- 
cured by Method I. By equation (5), R? = P'S“P. If this is sub- 

































434 PSYCHOMETRIKA 


stituted in equation (8), and if the resulting L is substituted into 
equation (9), then 4 = PP“"S“P"“'Pd. 
But 


PrA=P*P =f, 
therefore 
A4=S"(d. 


It is apparent that Method I furnishes identical results with 
Method II but possesses the virtue of requiring less computation. Pre- 
sumably, Method II was devised to assure standardization of units 
of measurement. The above demonstration indicates that this stand- 
ardization occurs in Method I automatically by computing the in- 
verse of S. 


REFERENCES 

1. Cox, G. M., and Martin, W. P. Use of a discriminant function for differen- 
tiating soils with different azotobacter populations. Iowa State College Jour- 
nal of Science, 1939, 11, 328-82. 

2. Fisher, R. A., The use of multiple measurements in taxonomic problems. 
Ann. Eug., 1936-37, 7, 179-88. 

3. Garrett, H. E.,.The discriminant function and its use in psychology. Psy- 
chometrika, 1943, 8, 65-79. 

4. Hoel, P. G., Introduction to mathematical statistics. New York: John Wiley 
and Sons, Inc., 1947. 

5. Johnson, P. O., Statistical methods in research. New York: Prentice-Hall, 
Inc., 1949. 


Manuscript received 12/30/49 








ito 


ts 
j- 


l- 


ho 
fo 











PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


THE VARIANCE ERROR OF THE P;.-DISCRIMINANT* 


GILBERT L. BETTS 
EDUCATIONAL TEST BUREAU 


The P,,-discriminant has been reported elsewhere in connection 


with its use in predicting whether selective service registrants if in- 
ducted would become normal operative soldiers or would commit 
offenses causing their imprisonment. The standard error of the P,,- 


discriminant is a good measure to use in determining how far to 
the side of this statistic a particular case falls. The standard error 
formula itself has also been published elsewhere; but its derivation, 
as the variance error, is given here. 


Occasionally a prediction can be improved by the use of a dis- 
criminant statistic. If an obtained measure falls on one side of such 
a point of reference, the prediction will be one thing; but if it falls 
on the opposite side, the prediction will be something else. The far- 
ther to the side the measure lies, the greater the certainty of predic- 
tion. A good unit to use in determining how far to the side the meas- 
ure lies is the standard error of the discriminant statistic. 

Several discriminant statistics are in common use, but the one 
treated here, called the P;.-discriminant, was reported in connection 
with the induction of Selective Service registrants during World War 
II.t The Army’s problem was to induct registrants who would be 
likely to become normal, operative soldiers and not to induct regis- 
trants who would be likely to commit offenses causing their imprison- 
ment. In preparation for making these predictions, the Biographical 
Case Historyt was administered to two criterion groups, normal op- 
erative soldiers and imprisoned soldiers. The two distributions of 
obtained scores were overlapping distributions. The discriminant 
statistic used in predicting the category in which a registrant most 
probably belonged was a raw score such that the proportion of cases 

*The author gratefully acknowledges the very extensive assistance kindly 


given to him by Dr. Truman L. Kelley and Dr. Frederick Mosteller. This assist- 
ance was given without reference to the utility of the P,,-discriminant, upon 
join matter the author reports elsewhere and for which he takes full respon- 
sibility. 

+Betts, Gilbert L., The detection of incipient Army criminals. Science, 1947, 
106, 93-96; and Betts, Gilbert L., Test calibration for categorical classification. 
Educ. psychol. Meas., 1949, 9, 269-279. 


{Privately published by the author. 
435 
























436 PSYCHOMETRIKA 


in the normal group falling below it was equal to the proportion of 
cases in the prison group falling above it. 

In getting this discriminant statistic, in satisfaction of these 
conditions, two assumptions were made: (a) that the two popula- 
tion distributions from which samples were drawn are symmetrical, 
and (b) that one population can be obtained from the other by a 
linear transformation (translation and homogeneous stretching). 
When such is the case 


Zo — Mx _ Ma —ars0 M. > M,, (1) 


O71 2 





where 259 is the point below which 50 per cent of all cases fall when 


the populations are weighted equally, M, and M, are the means of 
the two populations, and o, and o, are the standard deviations of the 
two populations. 

Figure 1 shows the relationship graphically in the case of two 
normal distributions with means of 0 and 1 and standard deviations 






















6 4 
69 
A4 
24 ~ 
| 
: i ON 
{ | 
! 1 I | 
O { } {—_-|__, 
-3 -2 -! ° ~s SS” 2 3 
M, Tio M, 
FIGURE 1 


The P,,-Discriminant Statistic in a Pair of Overlapping Normal Popu- 
lations. 





of 


se 
a- 
i, 











GILBERT L. BETTS 437 


of 1 and i, respectively. In this figure the percentage of: population 
1 below a5. , plus the percentage of population 2 below aso, equals 100 
per cent. This 100 per cent below a;. represents one half of the 200 
per cent in the two populations. Solving formula (1) for 20 gives 


Aso = — M,z = M, (2) 


C1 + oe 





which, for the population in Figure 4, equals 2/3. 
The estimate of 2;., obtained from samples drawn from popu- 
lations, is 
S,M, + S.M, 


Pro = (3) 
Si iE S2 


where M, and M, are means obtained from samples, and S, and Sz 
are standard deviations obtained from samples. Formula (3) may 
be written 











S, Se 
P59 = M, + M,=W.M, + WM, (4) 
Ss + S- &. a Se 
where the relation between the weights is 
W,.+W.=1. (5) 


Except for this relationship between the weights, it is assumed that 
the means and weights in formula (4) are all statistically independ- 
ent of each other. 

The variance of P;) can be written 


V (P50) = V(W.M.) + V(WiM,) + 2C(W.M,) (W2M2), (6) 


where V is the variance and C the covariance. Because of relation 
(5) between the weights, the covariance term does not vanish. 

When the expected value operator is used, the definition of a 
variance is 

V.=E (x?) — [E(x)]*. (7) 
By definition the variance of W.M, is, therefore, 
V(W.M.) = E (W.?M.”) — [E (W-M;) ]? 
= E(W,?)E (M.*) — [E (W.) E (M2) ]? 


= [V(W.) + W.?][V(M.) + M,?] — (W2M,)? 


(8) 


= W.2V (M2) + M.2V(W.) + V(W2)V (mM). 











438 PSYCHOMETRIKA 


The variance of M., which is o.2/N., is known; but the variance of 
W. = S,/(S; + S2) is needed. Since S, and S, are independent, the 
following approximate relation can be used: 














vw.) =V(s) (SS ‘+ visy( 2) (9) 
ia ae eT ) “h Of 
Taking the derivatives 
0 We S2 mi - (10a) 
0S, (S,+8,)? S;? 
OW, | —S§, ie, (10b) 








aS, (SitS.)?  S, 


replacing the estimates by their true values, and using the large sam- 
ple normal approximations, V(S,) = V,/2N, and V(S.) = V2/2N2, 
where o,? = V, and o.? = V2, we find 


W.t =( 1 2 ) 
yb az —+—}), 11 
V (Wz) i S aee 8 (11) 





where W, = o:/(o, + o2). The variance of W,M, can of course be 


obtained by symmetry from equation (8). 
The third term of equation (6) is still needed. 


Cl (WM) (W.M2)] = E[(W.M;) (W2M2) J 
— E(W,M,) E(W.M:2) 


= E(W,W.)M,M. — W,W.M.M, . (12) 
Now since 
E(W,W.) =E[W,(1—W,)] =W.—Wz—V(W.) 
= W,W.—V(W,) (13) 
and V(W,) = V(W.) = V(W), therefore 
C=—M,M.V(W). (14) 


From the results of the three terms for the variance of Ps», as 
given in formula (6), 


V (Pso) = We2V (M2) + Me2V(W) + V(W)V(M.) + W2V(M) 
+M,2V(W) + V(W)V(M,) —2M,.M.V(W), (15) 


which may be rewritten 





of 


») 


a) 





GILBERT L. BETTS 439 


V (Pso) = W2?V (Mz) + Wi2V (Mi) 
+ V(W) [M2 —2M.M, + M.2+V(M,) + V(M2)]. (16) 


Replacing the variances by the expressions already given yields 


o;” o2” 2” a," 


P,) -———- Ft  — - 
i (0; + 02)? Ne (0, + 02)? Ni 


2 Oo" 


(sr tax)| a 2.) o; = | 
(0; = o2)* 2N, 2N2 (iM, — 2) + N; il ‘Ne : (17) 


Neglecting the last two terms in the last bracket, because they are 
of order 1/N?, we may write the variance error of Ps. as 








1 1 0,” o2" pee + a2)? - (M, — M.)? 
2 P = — + — 
“ wo) ‘= mc) (0; + o2)* 2 


in which obtained sample values may be used as estimates of popu- 
lation values, for practical application. 

In the use of this formula, it must be remembered that in its 
derivation it was explicitly assumed that the number of cases in each 
sample is large (say greater than fifteen). Furthermore, the assump- 
tion of a normal distribution in both populations was implied. It is 
realized that the samples with which practitioners deal will seldom 
satisfy all these conditions. It is believed, however, that many sam- 
ples with which they deal approximate these conditions sufficiently 
well to make the formula useful. 


, (18) 


Manuscript received 4/1/49 
Revised manuscript received 12/5/49 








PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


MACHINE SHORT-CUTS IN THE COMPUTATION OF CHI- 
SQUARE AND THE CONTINGENCY COEFFICIENT 


JOHN B. CARROLL 
HARVARD UNIVERSITY 
AND 


C. C. BENNETT 
BOSTON UNIVERSITY 


Rapid computational routines are presented for calculating x2 
from frequency data in the following cases: (1) test of goodness 
of fit between an observed and a theoretical distribution; (2) test 
of independence of distributions displayed in an r X ec table; (3) 
test of independence of distributions displayed in an r X 2 table. A 
rapid method of comvuting the contingency coefficient also follows 
from the procedure used in the second of these cases. 


In 1939, DuBois* published an article on the calculation of the 
chi-square test for goodness of fit. This paper presents a simplifica- 
tion of DuBois’ method together with methods for computing chi- 
square in other applications, and for calculating a contingency co- 
efficient. The methods presented here are particularly adapted to 











TABLE 1 
Data for x? Test of Goodness of Fit 
Class Observed Expected 
Indices Frequencies Frequencies 
a e 
ne 16 12 
150 
140 55 47 
130 120 152 
120 330 352 
110 610 582 
100 719 688 
90 592 579 
80 338 350 
70 130 150 
60 48 46 
50 
40 12 12 
30 
2970 —N 2970 — N 





*DuBois, P. H. Note on the calculation of the chi-square test for “goodness 
of fit.” Psychometrika, 1939, 4, 173-174. 


441 














442 PSYCHOMETRIKA 


more recent models of desk calculating machines, but they can also 
be used in connection with older models. 


Case I: x? test of goodness of fit. 


It is assumed here that we already have the expected frequen- 
cies. Suppose, then, that we have to find y? for such data as those in 
Table 1, adapted from Table 37 of McNemar’s recent text.* 

In this case, 7? can be obtained from the formulat 


a? 
tte dt tain (1) 


This quantity may be obtained on modern calculating machines 
in the following manner: 


(1) Decide how many decimal places are required in the answer appear- 
ing in the machine. Allow at least two places for systematic errors introduced 
by the inability of the machine to round off quotients. We will assume in this 
case that 4 places are required in the machine answer, the answer being accu- 
rate to two places when rounded off. 

(2) For the first pair of a and e values, put a in the keyboard with the 
number of decimal places determined above. In our problem, we put 16.0000 in 
the keyboard. 

(8) Square a so that the product will read positively in the product dials 
with twice the number of decimal places determined above. However, before 
doing this, set the counter-dial reverse key so that the complement of a will 
read in the counter dials. The result for the first a? will be as follows: 

a? = 256.00000000 in the product dial, 

—da = 999984.0000 in the counting or quotient dial. 

The machine should be operated so that the carriage will not return to No. 1 
position at the end of this multiplication. 

(4) Clear the keyboard (accomplished automatically by most machines) 
and divide the quantity in the product dial (a?) by e, putting the latter in the 
keyboard with the same decimal point as determined previously. Be sure the 
machine is set so that the quantity in the quotient dial will not be cleared at the 
start of the division operation. Divide positively, i.e., so that the quotient a?/e 
will roll positively into the quotient dial. The result appearing in the quotient 
dial will be — a + a?/e, or — 16 + 162/12 = 5.8888 for the first pair of values 
in our problem. 

(5) Clear the keyboard and the product dial (accomplished automatically 


*McNemar. Q. Psychological statistics. New York: Wiley, 1949. 


+It can easily be shown that this formula, as well as formulas (2), (3), and 
(5), can be derived from the basic formula, 


(a—e)? 
er pean 


x2=—== 











JOHN B. CARROLL AND C. C. BENNETT 443 


by some machines) and set the quotient dial so that it will not clear in subse- 

quent operations. Then proceed as in Steps (2), (8), and (4), for each suc- 

cessive pair of a and e values. At the end of this series of operations, the quan- 
a? a 

tity x2 = = — —Za—=— =z — —N will have accumulated on the quotient dial. 
e e 


For the present problem x? = 17.0078. This is accurate to two places when 
rounded off. (When seven decimal places are used, the value of x2 appearing in 
the machine, 17.0076929, is accurate to five places.) 


Case II: x? test of independence of several distributions (two or 
more), displayed in an r X ¢ table. 




















TABLE 2 
Worksheet for Computation of x? in an r X ¢ Table 
Category I II III N; 
1 24 56 71 151 
2 30 23 19 72 
3 35 25 12 72 
N; 89 104 102 295 = N 
Column values: 
Without multiplier 
non-entry 122.8283 140.7959 142.3979 x?— 295 [4.1242 —3 —1] 
= 36.6390 
With multiplier 
non-entry 33.3283 36.7959 40.3979 x?2—= 295 [1.1242 —1] 
= 36.6390 


Suppose we have the data in the first section of Table 2, which 
are adapted from McNemar’s Table 36. A short-cut procedure for 
obtaining x? without determining expected frequency values involves 
the evaluation of 





oo. am 
vd i=1 Ni . 


2—N ¥ —c— e 
% 2 N, ot (2) 
A frequency in the ith row and jth column is denoted a;;; N; denotes 
a row total and N; a column total. 
The numbers of rows and columns are denoted by r and c, respec- 
tively. On machines with a multiplier-non-entry feature, formula 
(2) can be modified to 





tg Qi;? 

PE? a 

y=N a . (3) 
jar =; 














444 PSYCHOMETRIKA 


The procedure for machines without a multiplier-non-entry feature 
is as follows: 


(1) Decide on the number of decimal places,.x, required in the answer. 
At least three decimal places should be allowed for systematic errors introduced 
by the inability of the machine to round off quotients. Set up a fixed decimal 
system with 7 places in the keyboard and the quotient dial and 2n places in 
the product dial. 


(2) Starting with the first column, put the a value of the first row in the 
keyboard and multiply by a so that the product a? will appear positively in the 
product dial with 2n decimal places, and so that the muliplier a will read posi- 
tively in the quotient dial with n decimal places. Clear the keyboard, put the N; 
of the first row in the keyboard as a divisor, with n decimal places. (It may be 
necessary to position the carriage so that dividend and divisor are in proper 
relative position, if the machine cannot be arranged to provide for this auto- 
matically.) Press the positive division lever or levers so that a’; ;/N; rolls posi- 
tively into the quotient dial. The result appearing in the quotient dial is 
a?,;,/N, + a;;, or 242/151 + 24 = 27.8145 for the value of a,, of our problem. 
Clear the keyboard and product dial (accomplished automatically in the next 
step by some machines). 

(3) Set the quotient dial for non-clearance, i.e., accumulation, and follow 
the procedure of step (2) for each successive value a;; in the first column 
(j = 1), excluding the column total. Record the result appearing in the quotient 


T 
dial, which will be Fad (a?;;/N;) + N;, or 122.8283 for our problem. 


(4) Clear the machine entirely and follow Steps (2) and (3) for each 
successive column in the problem (excluding, of course, the column of row to- 
tals). Record each column result at the foot of the column. 

(5) After the column results have been obtained, clear the machine en- 
tirely and by cumulative division obtain the sum of the quotients of the column 
values (obtained as above) and the N,’s. This is 


122.3283 4 140.7959 es 142.3979 
89 104 102 


= 4.1242 





for our problem. One greater than the number of columns is subtracted from 
this value, and the result is multiplied by N. For our problem x? = 295 (4.1242 
— 8 — 1) = 36.6390. This is very nearly accurate to one decimal place. (x? 
accurate to three decimal places is 36.681.) 

On machines with a multiplier-non-entry feature, the a;; values need not 
be allowed to roJl in the quotient dial in squaring the a;,’s in Steps (2) and (3). 
At the end of the computations for each column in Steps (3) and (4) the quo- 


¢ 
tient dials will give simply 2 (a?;;/N;,). In Step (5), the cumulative quotient 
of the column values and the N,’s need be decreased only by 1 before multiply- 
ing it by N to give x?. 
The completed worksheet will appear as in Table 2. 

















JOHN B. CARROLL AND C. C. BENNETT 445 


Case IIa: Calculations of a contingency coefficient from anr X ¢ 














table. 
It is well known that the contingency coefficient is given by the 
formula 
C= ; 
+N 
If we define 
3% uy, 3H 
sss N; ‘ 3 Ni 
= ing a, 
j= N; ja_~=N; 
the formula for the contingency coefficient can be rewritten as 
C=v(S—1)/S, (4) 


which utilizes quantities which are computed in the course of evalu- 
ating formula (2) or formula (3) by the methods described above. 
For the problem of Case II, C is computed as follows: 





C= v (.1242) / (1.1242) = .3323. 


Case III: x? test of independence of two distributions. 


TABLE 3 
An r X 2 Table 














P Row 
Response Group Totals 
Category I II (N;,’s) 
:h 27=<4, 15 42 
2 26=a, 16 42 
3 247 =a, 110 357 
4 Ao, 8 49 
5 39=a,; 15 54 
Column Totals 380 164 544 
(=N,) NS. (eM 





Suppose we have the observed frequencies of variates in two 
samples and wish to test the independence of the two samples, as in 
the data given in Table 3 (adapted from Table 35 of McNemar’s text, 
cited previously). 








446 PSYCHOMETRIKA 


x%° may be computed for this problem either by the method of 
Case II or by the method described below. The latter is probably 
more rapid even if the number of rows is small. In fact, the method 
can be conveniently applied to a 2 X 2 table. 

This method involves the evaluation of the formula 


eS Se ow, +e (5) 
x — i a nick a . 
rea | ia Ni ) ( N )| 


The procedure is as follows: 








(1) Evaluate the term N2/N,N, and record the answer to at least four 
decimal places. For our problem, N?/N,N, = 4.7486. In the succeeding steps 
we use the a values in only one column.* 

a,? 





+ 


and (8) of Case II. Leave the result in the quotient dial, ready for the next 
step. In this problem the quotient dial will read 646.8185 at this point. 

(3) Clear the keyboard and the product dial. Then obtain N,? so that the 
result will appear positively on the product dial with 2n decimal places, and so 
that during this operation the quotient dial will roll negatively. This has the 
effect of subtracting N, from the quotient dial. 

(4) Now divide the quantity in the product dial by N, using negative divi- 
sion so that the quotient N,?/N will be subtracted from the quotient dial. The 
result appearing in the quotient dial is the bracketed term of the above formula 
(1.8774 in this problem). 

(5) Transfer this result to the keyboard and multiply it by the term eval- 
uated in step (1), namely N2/N,N,. For our problem the result is (4.7486) 
(1.8774) = 6.5407. 


(2) Obtain the term( = + Ne) by the procedure outlined in Steps (2) 


rf a,2 
On machines with a multiplier-non-entry feature the term 2 a (Step 2) 
=1 0 
V 
can be obtained directly, as mentioned in @onnection with Case II. In Step (3) 
the multiplier-non-entry key can be used to prevent the entry of N, and this 
quantity will not have to be subtracted from the quotient dial. Thus, formula 


(5) can be rewritten: 
et me, 
N,N, \=3N, N/- 





The methods presented here are not recommended if it is de- 
sired to apply a correction for continuity, in which case it will be 
necessary to compute expected frequencies explicitly and apply ap- 
propriate corrections. In fact, the major risk in the methods devel- 
oped here is that the computer may fail to see the need for con- 

*In this illustration we will use the first column, although the second column 


might be preferred by some computers. If the second column is used, N, and 
N, are to be transposed. 


















JOHN B. CARROLL AND C. C. BENNETT 447 


tinuity corrections. It is recommended that computers scrutinize 
small N’s in row and column totals in order to estimate whether 
a correction for continuity should be applied, actually computing ex- 
pected frequencies in doubtful cases. 


Manuscript received 12/9/49 
Revised manuscript received 4/7/50 




















PSYCHOMETRIKA—VOL. 15, NO. 4 
DECEMBER, 1950 


BOOK REVIEW 


JERZY NEYMAN (ed.) Proceedings of the Berkeley Symposium on Mathe- 
matical Statistics and Probability. Berkeley: University of California Press, 
1949. Pp. 501 + viii.* 


The Berkeley. Symposium on Mathematical Statistics and Probability, made 
possible by a special grant of the Administration of the University of California, 
brought together, during August, 1945, and January, 1946, a total of thirty-three 
specialists and tapped their knowledge of mathematical statistics and probabil- 
ity, theory and applications. 

These Proceedings are the tangible product of this effort, and consist of 
two papers on the logical and philosophical foundations of probability and sta- 
tistical inference; seven on analytical and computational mathematics of prob- 
ability and statistics; six on the theory and techniques of statistical inference 
as such; one on the teaching of statistics and its role in a university, with sup- 
plementary discussion; and thirteen others on applications of probability and 
statistics. Some of the papers, both theoretical and applied, present brand-new 
results; some, neater solutions to previously solved problems; and some review 
particular phases of probability and statistics. Applications considered explicitly, 
or alluded to, relate to problems in agriculture, animal breeding, astronomy, 
demography, economics, entomology, evolution, forestry, genetics, insurance, me- 
teorology, military strategy, philosophy, physics, population dynamics, psychol- 
ogy, and telephony. Discussion of some of these is limited to presentation of 
experimental and operational problems requiring statistical treatment; others 
serve to illustrate the use of probability and statistical theory as tools in the 
planning, conduct, and interpretation of experiments and inquiries relating to 
complex phenomena. 

All in all, this volume provides an interesting, valuable, and stimulating 
panoramic view of probability, mathematical statistics, and their applications — 
as of January, 1946. It is a “must” for libraries of institutions where research 
is conducted at the postgraduate level in the fields covered by the papers in- 
cluded. The individual scientist will probably find only a few papers of direct 
interest to him and will be content to rely on a library copy, unless he is inti- 
mately concerned with mathematical statistics. A few of the papers of more 
general interest are discussed briefly below. 

“The Place of Statistics in the University,’’ by Professor Harold Hotelling, 
is his second paper on the teaching of statistics and its role in a university or 
college. (The first, “The Teaching of Statistics,” has been published in the An- 
nals of Mathematical Statistics, Vol. II, No. 4, December, 1940). In summing 
up his paper, Professor Hotelling states: 


The teaching of statistics, which has grown rapidly and seems likely 
to grow much further still, has many unsatisfactory features. The chief 
of these is the inadequate preparation in statistical theory of a large 
proportion of those teaching the subject. The evils tend to be perpetu- 
ated by the prevailing system of independent courses in elementary sta- 
tistical method scattered through numerous departments concerned with 
applications. This system places the selection, supervision, and promo- 
tion of teachers of statistical method and theory in the hands of those 
who are not specialists in this subject. Teachers and prospective teach- 


*The Review Editor acknowledges with regret that this review is a condensation, made nec- 
essary by space limitations, of a more lengthy review which Dr. Eisenhart has written. The com- 
plete review contains a more detailed discussion of many of the articles, frequently with extensions 
by the reviewer, and historical orientations. Copies of the complete review are available by direct 
correspondence with Dr. Eisenhart. 


449 














PSYCHOMETRIKA 


ers of the theory of statistics feel a pressure to divert their efforts away 
from this theory and into its applications. In consequence, both statisti- 
cal theory and the underlying mathematics are slighted, with the result 
that erroneous and inefficient methods continue to be taught and applied. 

It is recommended that the preparation of teachers of statistical 
methods and theory be focused more definitely on this subject itself and 
the mathematics essential to it. Some study of a field of application, and 
practice in applications, are also desirable, but should not dominate the 
graduate curriculum in statistics. 

Organization of the teaching of statistical methods should be cen- 
tralized, and should provide also for the joint functions of research and 
of advice and service needed by others in the institution, and possibly 
outside it, regarding the statistical aspects of their problems of design- 
ing experiments and interpreting observations. Beginning courses in 
statistical methods and theory should be taught only under the super- 
vision of the central statistical organization, but courses in applied sta- 
tistics, requiring these beginning courses as prerequisites, might be 
taught in any department. Of these first courses there should be two, one 
based on calculus and the other requiring no mathematics beyond ele- 
mentary algebra. The more mathematical of these courses would be the 
more valuable, and efforts should be made to bring the larger number of 
students into it. The central statistical group would also teach more ad- 
vanced courses in the subject. 


Each time I have read Professor Hotelling’s papers on the teaching of sta- 
tistics I have come away with two strong impressions: first, that he is primarily 
interested in the development of statistics as a branch of mathematics and the 
production of statistical scholars; and second, that he ignores the problem of 
transfer of ideas, habits, and skills from one context to another. His insistence 
on a central first course in statistical method—or two such central courses, one 
requiring calculus and the other no mathematics beyond elementary algebra— 
is justified at present, I feel, by the acute shortage of persons adequately trained 
in modern statistical methods within the various subject-matter fields where sta- 
tistical method can be used to great advantage. As “statistical missionaries,” 
trained in such “first courses” and their sequels, become available in those fields, 
students should receive their first courses from these “missionaries,” who will 
be better fitted to reveal statistical method as a means to ends lying within the 
subject-matter field by considering problems of intrinsic interest to the class, by 
utilizing real data of familiar types, by assigning more work on material in the 
raw and less tidying-up of textbook exercises, and by pointing out simple ap- 
proximate methods that lead to the same conclusions or doubts as more elaborate 
methods involving, perhaps, more restrictive assumptions. When this day has 
come, the more mathematical of Professor Hotelling’s “first courses” might then 
be retained as the sole first course in statistics for prospective mathematics and 
statistics majors, and constitute a “second course” for students of other fields 
who wish to pursue statistics beyond their first courses. The ultimate need 
for “first courses” spread around the campus in addition to a central more 
mathematical “first course” appears to be attested to by several of Professor 
Hotelling’s discussants, who express a clear recognition of the fact that stu- 
dents who have had little or no research experience more readily acquire an 
appreciation — and a better understanding! — of statistical methods when the 
problems discussed and the illustrative data are drawn from familiar fields than 
when statistics is taught as a separate discipline. 

One of the most important papers in the Proceedings. from the viewpoint 
of techniques of statistical inference. is Professor Wolfowitz’s contribution, 
“Non-parametric Statistical Inference.” By way of introduction he writes, “In 
most statistical problems treated in the literature a datum of the problem is the 
information that the various distributions of the chance variables involved be- 
long to given families of distribution functions (d.f.’s) completely specified ex- 
cept for one or more parameters. Non-parametric statistical inference is con- 
cerned with problems where the d.f.’s are not specified to such an extent, and 

















BOOK REVIEW 451 


where their functional forma is unknown. This does not preclude some knowledge 
of the d.f.’s; for example, we may know that they are continuous, uni-modal, bi- 
modal, and the like.” 

Professor Wolfowitz’s paper provides an interesting and readable survey of 
non-parametric procedures, with attention to such matters as the “consistency” 
and “efficiency” of non-parametric procedures. His is a “more or less heuristic 
and intuitive” presentation, making “no attempt at covering the entire field.” 

In an appendix to his paper, Professor Wolfowitz contributes a new result: 
a formula for the asymptotic variance of U, the total number of runs in a set 
of observations, when the observations are drawn from two different popula- 
tions. This result makes it possible to obtain a good approximation to the power 
function of the Wald-Wolfowitz test of whether two samples come from the same 
population “for alternatives subject to some slight . . . restrictions, when the 
sample sizes are large.” 

Dr. P. L. Hsu’s contribution, “The Limiting Distribution of Functions of 
Sample Means and Application to Testing Hypotheses,” is a very valuable addi- 
tion to the literature on asymptotic distributions of functions of sample mo- 
ments — a literature containing a wealth of related results which someone should 
take the trouble to consolidate for the benefit of all of us. 

In a previous paper,* Dr. Hsu gave some theorems for the case of functions 
of “means” (interpreted in-the-large, so as to include moments and product- 
moments) of any finite number of samples of the same size. In the first part 
of his Proceedings paper, he gives two theorems which extend these results to 
the case of “means” of samples of different sizes, his discussion being limited 
here (and in his previous paper) to situations where the limiting distribution 
is normal or the distribution quadratic form in normal variables, noting, in the 
latter instance, the circumstances under which the distribution reduces to a x? 
distribution. Examples considered explicitly include the x? test for goodness of 
fit, the x2 test of “homogeneity” in contingency tables, Student’s t, the L and 
L, test functions, and Wilks’ test functions for testing the independence of k 


sets of random variables. In the second part of his paper, Dr. Hsu continues, 
in much the same vein, with a consideration of Hotelling’s 7 and Mahalanobis’ 
D (in both studentized and unstudentized forms) and formulates a systematic 
method of constructing test functions for either of two hypotheses of general 
character relating to multiple samples from multivariate distributions. 

The title of Professor Neyman’s own paper, “Contribution to the Theory of 
the x2 Test,” does not reveal what the paper is really about. While several alter- 
native definitions of the familiar symbol x? are discussed, the body of the paper 
is divided into two parts, the first defining and discussing “a class of estimates 
. . . termed best asymptotically normal estimates (BAN estimates, for short), 
all having the same asymptotic properties as the maximum-likelihood estimates 
but varying in the ease with which they can be computed” [italics ours], and 
the second devoted to development of “a class of tests . . . which are all equiva- 
lent in the limit to \ tests.” Although the paper is primarily concerned with 
techniques of estimation and testing statistical hypotheses that are optimal— 
with peers but no superiors, and hence equally entitled to be called “best” — 
when based on infinitely large samples, the title adopted is nevertheless appro- 
priate since “both the computation of BAN estimates and the application of the 
statistical tests considered involve minimization of the alternatively defined x?’s.” 

A warning is sounded by Professor Lehmann in his paper, “Some Comments 
on Large Sample Tests,” against concentrating too much attention on asymp- 
totic properties of statistical estimation and test procedures, that is, on prop- 
erties which are possessed only in the limiting case of samples of infinite size. 
He constructs two examples, one relating to the mean of a normal distribution 
and the other to the extent of the one-parameter uniform distribution, which 
show that of two tests which are asymptotically equivalent as defined by Ney- 
man and both asymptotically “best” in either of two senses (“asymptotically 
most powerful” and “asymptotically most stringent,” as defined by Wald), one 


*p, L. Hsu. The limiting distribution of a general class of statistics. Science Record (Aca- 
demia Sinica), 1942, 1, 87-41. 














452 PSYCHOMETRIKA 


may be universally better than the other, i.ec., more powerful with respect to 
every member of the class of alternatives (to the null hypothesis) considered, 
in samples of any finite size. He concludes: “Actually it seems doubtful that any 
definition of optimum tests, based only on asymptotic properties of power func- 
tions, can be very satisfactory, since in practice the sample size is always lim- 
ited and since obviously an asymptotic property implies nothing about the be- 
havior of any finite segment of a sequence of power functions.” 

This conclusion undoubtedly carries over with equal force to any definition 
of optimum estimators involving only asymptotic properties, e.g., BAN esti- 
mates, regarding which Professor Neyman remarks: “The question remains open 
concerning how good these estimates are when the number of observations is 
only moderate.” In this connection it needs to be emphasized that the method of 
maximum likelihood as developed by R. A. Fisher is not supported solely by the 
asymptotic properties of the estimators to which it leads, but has also to its 
credit that when sufficient statistics (I prefer the term “exhaustive estimators”) 
exist — a finite sample property — the method of maximum likelihood will lead 
to them. 

There are fashions in probability and statistics, just as there are fashions 
in everyday affairs. A short while ago sequential analysis was the rage in sta- 
tistics; a little later, the theory of games of strategy became the fad; and now 
the limelight seems to have shifted to stochastic processes and the fundamental 
mathematics of time series. The papers on these two subjects in the Proceed- 
ings, by Feller and Doob, respectively, are therefore very timely now, several 
years after their original presentation in the Symposium. 

For most people the best way to gain an understanding of a new subject 
is not by study of a formal treatment of it, but by analysis of dozens, or huna- 
dreds, of examples. For such people — the present reviewer included — Pro- 
fessor Feller’s brilliant and lucid survey of stochastic processes from the view- 
point of applications, “On the Theory of Stochastic Processes, with Particular 
Reference to Applications,” is precisely what is needed. His expository paper 
on stochastic processes is certainly one of the best, and from many standpoints 
probably the best paper in the Proceedings volume. 

Professor Doob’s paper, “Time Series and Harmonic Analysis,” is an ex- 
pository treatment of the fundamental mathematics of time series and harmonic 
analysis, and picks up the study of processes which evolve in time, space, or 
both, more or less where Professor Feller leaves off. The need for, and the aim 
of, _ continuation is well expressed by Professor Doob in his opening para- 
grapn: 


Although many articles on the present subject have appeared in the 
mathematical, statistical, and physical literature, there still seems to 
be some justification for one more. The statisticians have applied only 
small parts of the theory; the physicists have gone deeper, but write 
like physicists; the mathematicians have gone furthest, but write like 
mathematicians, only for posterity. Their work is frequently not under- 
stood, and is in general either ignored or applied in simplified forms 
which often are formally more formidable than the original rigorous 
one. The present paper attempts to give a compact outline of the har- 
monic analysis of stochastic processes, with applications to physical 
problems. 


While Professor Doob’s paper is for the most part “given over to the har- 
monic analysis of stochastic processes, the harmonic analysis of [non-stochastic] 
functions is outlined briefly, in order to exhibit the parallelism between the 
two . . . and the greater simplicity of the first.” Although some applications 
to problems in physics, electrical circuit theory, and the like, are given, the 
exposition is quite mathematical and considerably more difficult to follow than 
Professor Feller’s treatment of the more elementary types of stochastic processes. 

Before leaving the subject of time series, it should be noted that the Pro- 
ceedings contains a paper by G. F. McEwen on statistical procedures for test- 
ing, “The Reality of Regularities Indicated in Sequences of Observations.” Pro- 














BOOK REVIEW 453 


fessor McEwen’s treatment of this subject is, however, fairly cursory and based 
on the literature of the subject up to 1940, since which time there has been much 
activity and many publications in this field. 

In time it may be possible to assess the full influence of the Berkeley Sym- 
posium as a stimulus to “the return to theoretical research” in probability and 
mathematical statistics. But such an evaluation will be complicated by the fact 
that while the participants in the Symposium were “stimulated” in 1945-1946, 
other workers in probability and statistics, with the exception, perhaps, of stu- 
dents and immediate colleagues of the participants, had to wait until 1949 — 
three years later — to be “stimulated” by it and were not idle while they waited. 


National Bureau of Standards Churchill Eisenhart 


BOOKS RECEIVED 


S. HowarpD BARTLEY. Beginning Experimental Psychology. New York: McGraw- 
Hill Book Co., 1950. Pp. 483 + vii. 

RAYMOND B. CATTELL. Personality. New York: McGraw-Hill Book Co., 1950. 
Pp. 689 + xii. 

WwW. EpwArDs DEMING. Some Theory of Sampling. New York: John Wiley & 
Sons, Inc., 1950. Pp. 602 + xvii. 

JAMES G. MILLER. Experiments in Social Process. New York: McGraw-Hill 
Book Co., 1950. Pp. 205 + ix. 

CLIFFORD T. MORGAN AND ELIOT STELLAR. Physiological Psychology (2nd edition). 
New York: McGraw-Hill Book Co., 1950. Pp. 609 + ix. 

Puitip E. VERNON AND JOHN B. Parry. Personnel Selection in the British 
Forces. London: University of London Press, 1950. Pp. 324. 




















PSYCHOMETRIC SOCIETY 





STATEMENT OF RECEIPTS AND DISBURSEMENTS FOR 


FISCAL YEAR ENDED JUNE 30, 1950 





RECEIPTS 
Dues: Members Student Members 
Membership Year No. Amt. Pd. No. Amt. Pd. 
1951 - - - - - - - = 1 $ 5.00 
1950 - - - - - - = - 3801.75 1508.75 45.388 $136.00 
1949 - - - - - - - - 18 65.00 7 21.00 
1948 - - - - - -- - 1 5.00 
1945 - - - - - --- i 5.00 
1944 - - - - - = - = 1 5.00 
Total - - - - - - - 818.75 $1593.75 52.83 $157.00 
Total Dues Received - - - - - - - - - - - = = 
MISCELLANEOUS RECEIPTS 
For back issues (transferred to Corporation) - - - - 
Total Receipts - - - - - - - = - - --+ - = 
DISBURSEMENTS 
OPERATING EXPENSES 
Stationery and Postage - - - - - - - - - - - $ 91.50 
Clerical - - - - s- = = s+ se es se es ee 35.68 
Phone - - - - - - - = = = = = = = = = = 8.32 
Psychometric Corporation (90% of dues) - - - - - 1575.68 


Total operating expenses - 


MISCELLANEOUS DISBURSEMENTS 
Transfer of funds to Psychometric Corporation (for 
back issues) 


Total Disbursements 


BALANCE 
Excess of receipts over disbursements - - - - 
Balance, June 30, 1949 


Bank Balance, June 30, 1950 


PSYCHOMETRIC CORPORATION 


STATEMENT OF RECEIPTS AND DISBURSEMENTS FOR 


FISCAL YEAR ENDED JUNE 30, 1950 














RECEIPTS 
Subscriptions: Institutional Individual 
Year No. Amt. Pd. No. Amt. Pd. 
1949 - - - - - 28 $ 230.00 
1950 - - - - - 277 2770.00 3 $ 15.00 
300 $3000.00 3 $ 15.00 


455 


$1750.75 


5.00 





$1755.75 


1711.18 


5.00 





$1716.18 


$ 39.57 
955.01 





$ 994.58 









PSYCHOMETRIC CORPORATION 
STATEMENT OF RECEIPTS AND DISBURSEMENTS FOR 
FISCAL YEAR ENDED JUNE 30, 1950 










RECEIPTS (Cont’d.) 
Total subscription payments - - - 


Payments received for back volumes and issues 


Total subscriptions and sales, at list price 


Less agency discounts - - - - - 


Net receipts from subscriptions and sales 
Receipts from Psychometric Society 


MISCELLANEOUS RECEIPTS 
Psychometric Monographs - - - - 
Stopped checks - - - - - - - - 


Adjustment from printer 
Overpayment received 





Other receipts - - 
Total miscellaneous receipts - - - - - - = 


TOTAL RECEIPTS - - - - - - - - - - 


DISBURSEMENTS 
OPERATING EXPENSES 
Dentan Printing Company - - - - - - - - 
Secretarial Services. - 
Editorial Services - - - - - - - - - = = 
Stationery and postage - - - - - - - =- - 
Phone calls - - - - - 
Expense of mail questionnaire - - - - + 
Treasurer’s bond - - - - - - - - = = = 


Total operating expenses - - - - - - - = 
Reprinting back issue - - - - - - - - - - 


MISCELLANEOUS DISBURSEMENTS 
Psychometric Society for dues paid to Corporation 
Replacement of stopped checks - - - - - - - 
Refund of overpayments - - - - - - - - = 


Total miscellaneous disbursements - - - - - 


Total disbursements - - - - - - - - - 
BALANCE 


Excess of receipts over disbursements - - - - 
Bank balance, June 30, 1949 - - - - - - - 


Bank Balance, June 30,1950 - - - - - - - 


$ 4.12 
20.00 
80.25 

5.00 
8.25 


$2679.16 
449.57 
762.65 
138.53 
7.22 
82.14 
25.00 





$ 15.00 
19.00 
10.00 








$3015.00 
931.75 





$3946.75 
290.73 





$3656.02 
1575.68 


67.62 





$5299.32 


$4094.27 
268.00 


44.00 





$4406.27 


$ 893.05 
8336.48 





$9229.58 




















INDEX FOR VOLUME 15 


AUTHOR 

Adkins, Dorothy C., “A Superior Rotational Method in Factor Anal- 
ysis or Psychometricians in Government Service.” 331-338. 

Andrews, F. C. (with Z. W. Birnbaum and E. Paulson), “On the Ef- 
fect of Selection Performed on Some Coordinates of a Multi- 
Dimensional Population.” 191-204. 


Bedell, B. J., “Determination of the Optimum Number of Items to 
Retain in a Test Measuring a Single Ability.” 419-430. 

Bennett, C. C. (with John B. Carroll), “Machine Short-Cuts in the 
Computation of Chi-Square and the Contingency Coefficient.” 
441-447, 

Betts, Gilbert L., “The Variance Error of the P5.-Discriminant.” 435- 
439. 

Birnbaum, Z. W., “On the Effect of the Cutting Score When Selection 
is Performed against a Dichotomized Criterion.” 385-389. 

Birnbaum, Z. W. (with E. Paulson and F. C. Andrews), “On the Ef- 
fect of Selection Performed on Some Coordinates of a Multi- 
Dimensional Population.” 191-204. 

Brogden, Hubert E., “J. P. GUILFORD and WILLIAM B. 
MICHAEL. The Predictions of Categories from Measurements: 
With Applications to Personnel Selection and Clinical Progno- 
sis.” A Review. 328-329. 

Bruner, J. S. (with L. Postman and F. Mosteller), “A Note on the 
Measurement of Reversals of Perspective.” 63-72. 

Carroll, John B., “ROBERT L. THORNDIKE. Personnel Selection: 
Test and Measurement Techniques.” A Review. 83-88. 

Carroll, John B. (with C. C. Bennett), “Machine Short-Cuts in the 
Computation of Chi-Square and the Contingency Coefficient.” 
441-447, 

Comrey, Andrew L., “A Proposed Method for Absolute Ratio Scal- 
ing.” 317-325. 

Cottle, Wm. C., “A Factorial Study of the Multiphasic, Strong, Kuder, 
and Bell Inventories Using a Population of Adult Males.” 25-47. 

Deemer, W. L. (with D. F. Votaw, Jr., and J. A. Rafferty), “Estima- 


457 





458 PSYCHOMETRIKA 


tion of Parameters in a Truncated Trivariate Normal Distribu- 
tion.” 339-347. 

Edwards, Allen L. (with Paul Horst), “The Calculation of Sums of 
Squares for Interactions in the Analysis of Variance.” 17-24. 


Eisenhart, Churchill, “JERZY NEYMAN (Editor). Proceedings of 
the Berkeley Symposium on Mathematical Statistics and Prob- 
ability.” A Review. 449-453. 

Fay, Leo C. (With Palmer O. Johnson), “The Johnson-Neyman Tech- 
nique, Its Theory and Application.” 349-367. 

Festinger, Leon, “QUINN McNEMAR. Psychological Statistics.” A 
Review. 209-213. 

Green, Bert F., Jr., “A Note on the Calculation of Weights for Maxi- 
mum Battery Reliability.” 57-61. 

Green, Bert F., Jr., “A Test of the Equality of Standard Errors of 
Measurement.” 251-257. 

Guilford, J. P., “TRUMAN LEE KELLEY. Fundamental Statistics.” 
A Review. 76-79. 

Guilford, J. P. (with William B. Michael), “Changes in Factor Load- 
ings as Tests are Altered Homogeneously in Length.” 237-249. 

Gulliksen, Harold, ‘The Reliability of Speeded Tests.” 259-269. 

Gulliksen, Harold (with S. S. Wilks), “Regression Tests for Several 
Samples.” 91-114. 

Hamilton, C. Horace, “Bias and Error in Multiple-Choice Tests.” 
151-168. 

Horst, Paul, “A Note on Optimal Test Length.” 407-408. 


Horst, Paul (with Allen L. Edwards), “The Calculation of Sums of 
Squares for Interaction in the Analysis of Variance.” 17-24. 

Horst, Paul (with Stevenson Smith), “The Discrimination of Two 
Racial Samples.” 271-289. 

Johnson, Helmer G., “Test Reliability and Correction for Attenua- 
tion.” 115-119. 

‘Johnson, Palmer O. (with Leo C. Fay), “The Johnson-Neyman Tech- 
nique, Its Theory and Application.” 349-367. 

Kimball, Allyn W., “Sequential Sampling Tests for Use in Psycho- 
logical Test Work.” 1-15. 





1- 


ie 


yf 





INDEX 459 


Luce, R. Duncan, “Connectivity and Generalized Cliques in Sociomet- 
ric Group Structure.” 169-191. 


Lyman, John (with Pietro V. Marchetti), ‘““A Device for Facilitating 
the Computation of the First Four Moments About the Mean.” 
49-55. 


McNemar, Quinn, “On Festinger’s Review of Psychological Statis- 
tics.” 213-214. 


Marchetti, Pietro V. (with John Lyman), “A Device for Facilitating 
the Computation of the First Four Moments About the Mean.” 
49-55. ; 

Michael, William B., “PALMER O. JOHNSON. Statistical Methods 
in Research.” A Review. 327-328. 


Michael, William B. (with J. P. Guilford), “Changes in Factor Load- 
ings as Tests are Altered Homogeneously in Length.” 237-249. 


Mollenkopf, William G., “An Experimental Study of the Effects on 
Item-Analysis Data of Changing Item Placement and Test Time 
Limit.” 291-315. ; 

Mollenkopf, William G., “Predicted Differences and Differences Be- 
tween Predictions.” 409-417. 


Mosteller, F., “ S. S. WILKS. Elementary Statistical Analysis.” A 
Review. 73-76. 


Mosteller, F. (with J. S. Bruner and L. Postman), “A Note on the 
Measurement of Reversals of Perspective.” 63-72. 


Paulson, E. (with Z. W. Birnbaum and F. C. Andrews), “On the Ef- 
fect of Selection Performed on Some Coordinates of a Multi- 
Dimensional Population.”’ 191-204. 


Psychometric Corporation, Report of the Treasurer (June 1950). 456. 

Psychometric Society, Relation to the Federation of Statistical So- 
cieties. 205-207. 

Psychometric Society, Report of the Treasurer (June 1950). 455. 

Postman, L. (with J. S. Bruner and F. Mosteller), “A Note on the 
Measurement of Reversals of Perspective.” 63-72. 


Rafferty, J. A. (with D. F. Votaw, Jr., and W. L. Deemer), “Estima- 
tion of Parameters in a Truncated Trivariate Normal Distribu- 
tion.” 339-347. 





460 PSYCHOMETRIKA 


Reiersgl, Olav, “On the Identifiability of Parameters in Thurstone’s 
Multiple Factor Analysis.” 121-149. 

Reiner, John M., “N. RASHEVSKY. Mathematical Biophysics.” A 
Review. 79-83. 

Schmid, John, Jr., A Comparison of Two Procedures for Calculating 
Discriminant Function Coefficients.” 431-434. 

Schultz, Douglas G., “The Comparability of Scores from Three Mathe- 
matics Tests of the College Entrance Examination Board.” 369- 
384. 

Smith, Stevenson (with Paul Horst), “The Discrimination of Two 
Racial Samples.” 271-289. 

Taylor, Calvin W., “Maximizing Predictive Efficiency for a Fixed 
Total Testing Time.” 391-406. 

Thorndike, Robert L., “The Problem of Classification of Personnel.” 
215-235. 

Votaw, D. F., Jr., (with J. A. Rafferty and W. L. Deemer), “Estima- 
tion of Parameters in a Truncated Trivariate Normal Distribu- 
tion.” 339-347. 


Wilks, S. S. (with Harold Gulliksen), “Regression Tests for Several 
Samples.” 91-114. 








